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Present danger 


There is much hype about predicting and preventing future pandemics, but not enough is being 


done about a threat sitting under our noses. 


humans and cause a global outbreak, hard questions will be asked. 

Why did health authorities and scientists allow a virus with clear 
pandemic potential to fester for so long, and what more could have 
been done to nip it in the bud? Those questions need to be asked now, 
when there is still time to deal with the crisis. 

As of 16 June, the World Health Organization (WHO) had reported 
701 lab-confirmed cases of MERS-CoV (Middle East respiratory syn- 
drome coronavirus), including 249 deaths, since the virus was first 
identified in September 2012. The reported cases are largely confined 
to the Middle East, with most in Saudi Arabia. 

MERS-CoV is, in principle, eminently stoppable. It remains an animal- 
borne virus that sporadically infects humans: there have been large hos- 
pital outbreaks in which patients have infected health-care workers and 
others, but so far the virus does not spread easily between people. By 
tracking down its animal sources and the routes through which people 
contract it, authorities should be able to dam the stream of infections. 

But there is a risk that MERS-CoV, like the coronavirus SARS 
(severe acute respiratory syndrome), might mutate to spread easily 
between humans and so propagate rapidly around the world. SARS 
was detected in late 2002 and stamped out in July 2003; in those few 
months, it caused more than 8,000 infections and 700 deaths. Key to 
the defeat of SARS was a tightly coordinated international public- 
health effort, led by the WHO. The organization assembled an effec- 
tive in-house outbreak-response team and quickly put together an 
international network of scientists that for the most part set competi- 
tion aside in favour of collaboration. 

Partly asa result of SARS, in 2005 the WHO's member states agreed 
on legally binding International Health Regulations to strengthen the 
international response to public-health events that occur in individual 
countries but potentially pose a global threat. The rules, for exam- 
ple, require countries to strengthen their disease surveillance and 
outbreak-response infrastructure, and to report all cases of possible 
international concern to the WHO within 24 hours. 


I: the deadly disease MERS-CoV evolves to spread easily between 


TRY HARDER 

When it comes to MERS-CoV, the lessons of SARS success have too 
often been ignored. This is perhaps due in part to a mistaken percep- 
tion that MERS-CoV is less urgent than was SARS, because it does 
not yet spread easily between people. Research groups have tended to 
compete rather than cooperate. From the outset, conflict and distrust 
over credit, patents and sharing of specimens and data have marred 
efforts. (see Nature http://doi.org/s75; 2013). 

Saudi Arabia's response to MERS-CoV has been better than many 
of its critics give it credit for. Tackling the outbreak is challenging: with 
only a few hundred cases to go on, tracking down clues to the source of 
infections is not easy in a country that is almost three and a half times 
the size of France. But even so, response efforts have suffered from 


ineptitude, infighting and inadequate transparency. Saudi Arabia may 
be rich, but it is on a steep learning curve when it comes to interna- 
tional research collaboration and dealing with a complicated outbreak. 
In April, Saudi Arabia replaced its health minister as case numbers 
surged, and last month it created a Command and Control Center that 
brings together scientists and public-health officials to better coordinate 
control efforts, and acts as a focal point for international collaboration. 
This month, it removed deputy health minis- 


“Diplomacy ter Ziad Memish — the most prominent pub- 
and trust are lic face of Saudi MERS-CoV efforts — and 
key to building announced 113 cases and 92 deaths that had 
an effective occurred since 2012 but had gone unreported 
outbreak (these cases are not included in the WHO’ lat- 


est totals). It is too soon to say how effective 
the Command and Control Center will be, but 
domestic pressure to stop MERS-CoV is at an all-time high. 

The WHO has been much less prominent and decisive on MERS- 
CoV than it was on SARS. Its outbreak-response division is under- 
funded and understaffed, and effective leadership has been lacking. 

On the positive side, researchers have obtained a lead, finding the 
virus in camels in Saudi Arabia, Egypt, Oman and Qatar. Antibodies 
to the pathogen — evidence of past infection — have been detected in 
camels in many countries in the Middle East and North Africa. Last 
week, researchers reported finding the virus in unpasteurized camel 
milk. But almost two years after MERS-CoV was first identified, no 
one has definitively pinned down its routes of transmission to humans. 
Scientists and authorities could, and should, do better. 

The many cases caused by hospital outbreaks, for instance, could 
have been prevented by rigorous infection-control measures. Rapid 
identification and isolation of cases, decontamination of surfaces and 
use of protective clothing such as masks can all help to block infection 
of people in contact with patients. 

Outbreak response cannot always be decreed by international rules. 
There is tension between the sovereign right of nations to handle the 
situation in their own countries and the desire of the international 
community to intervene and prevent the disease crossing bor- 
ders. Diplomacy and trust are key to building an effective outbreak 
response. Saudi Arabia needs to be encouraged, not alienated. 

The International Health Regulations say little about research, but a 
separate WHO agreement sets out clear rules for sharing samples and 
sequences of pandemic influenza viruses. Similar rules for all infec- 
tious diseases that have pandemic potential are needed. 

What is most lacking in the fight against MERS-CoV is global lead- 
ership. The WHO, as an intergovernmental agency with a direct line 
to health ministries, remains best placed to bang heads together and 
get things done cooperatively, but its efforts must be well funded and 
staffed. Politicians everywhere must wake up to the fact that the world 
has another Middle East problem. = 


response.” 
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Quanundrum 


Does reality exist? Fifty years on, Bell’s 
theorem still divides (and confuses) physicists. 


hen it comes to Bell’s theorem, a cornerstone of modern 
Wests mechanics, there is one thing that everyone 

agrees on: it was published 50 years ago. Everything else 
is open to debate — especially its interpretation — and there is little 
prospect of these matters being settled soon. Indeed, Bell’s theorem has 
become synonymous with the most puzzling meeting of metaphysics 
and physics that science has to offer. 

Nature prides itself on writing for the general reader, but explaining 
the idea published by Northern Irish physicist John Stewart Bell in 
1964 poses a stiff challenge to that mantra of accessibility. But confused 
readers can be consoled by the fact that they are not alone: even the best 
quantum physicists are left bewildered by Bell's theorem. Still, to unlock 
the secrets of the Universe, a little effort seems worthwhile. 

In short, Bell predicted that measurements on entangled quantum 
particles will be incompatible with one of two common world views. 
The first is locality — the idea that a measurement on a London desk 
cannot be influenced by the setting of a measuring device in New York. 
The second is realism — that there is a reality that is independent of 
what we measure or observe. 

Before Bell, both were common assumptions in science. For most 
people, they still are. But for physicists who step from the physical 
world into the quantum universe, Bell’s theorem poses a real chal- 
lenge. They must accept either that entangled quantum particles can 
influence each other instantaneously, even if they are light years apart, 
or that in the quantum world there is no Moon if nobody looks. Bell’s 


predictions have withstood all experimental tests so far, so it looks like 
we have to give up at least one dearly held, intuitive concept. 

The reluctance of physicists to choose either of the possible options is 
illustrated by the fact that they still disagree on what exactly to make of 
Bell’s theorem. For example, a conference in Vienna this week to celebrate 
the 50th anniversary of Bell’s big idea will not merely issue a few historic 
outlooks and then move on to the hot topics of today. Rather, the theorem 

itself remains hot. (Sample talk title in Vienna: 


“Even the ‘My struggle to face up to unreality’) 

best quantum It is not that quantum physics has gone 
physicists are nowhere over the past 50 years. On the con- 
bewildered by trary: in the 1990s, quantum physics expe- 


rienced a boost that has been coined the 
‘second quantum revolution, when the theo- 
ries developed in the first revolution were translated into practical 
quantum technologies such as unbreakable cryptography protocols 
and ultrafast computing concepts. After all, we can simply use the 
equations of quantum mechanics to invent new technology without 
understanding their deeper meaning. 

Still, the second quantum revolution was at least partially triggered 
by contemplations about the meaning of it all. Quantum physicist 
Artur Ekert, for instance, devised one of the key ingredients for secure 
quantum communication while pondering the meaning of Bell’s theo- 
rem (A. K. Ekert Phys. Rev. Lett. 67, 661; 1991). 

Today's quantum-physics agenda holds great promise for such 
fruitful collaboration between fundamental research and practical 
applications. For example, the search for the biggest objects that can be 
subject to quantum superposition is not only motivating theorists to 
think about possible universal distinctions between the macroscopic 
classical and the microscopic quantum world, but also prompting the 
improvement of experimental tools that will probably become useful 
in other contexts. 

See, that wasn't too hard. Was it? = 


Bell’s theorem.” 


Summer skills 


A fledgling neuroscience programme is a rare 
beacon of research excellence in Romania. 


house in a remote part of the Romanian region of Transylvania. 
Indeed, something strange was happening there this month, in 
the Pike Lake Pension. Much of the gently rolling farmland around the 
house is still worked by horsepower, but within its walls stand a couple 
of twenty-first-century two-photon microscopes. They were built bya 
group of young neuroscientists who also write the software needed to 
operate them. The team has used the microscopes in behavioural experi- 
ments involving specially bred mice — having gained ethical approval 
from the University of Medicine and Pharmacy in Transylvanias capital, 
Cluj. The researchers aim to identify neural circuits in the brain, and use 
optical-genetics techniques at the cutting edge of modern neuroscience. 
The students are part of the third annual Transylvanian Experimental 
Neuroscience Summer School (TENSS), established by two idealistic 
Romanians who had, as school children, witnessed the demise of their 
country’s scientific base in the political chaos that followed the collapse 
of communism in 1989. One of these idealists is Florin Albeanu, an 
assistant professor at the Cold Spring Harbor Laboratory in New York; 
the other is Raul Muresan, a principal investigator at the Center for 
Cognitive and Neural Studies in Cluj. TENSS might not be quite enough 
to raise the country’s science from the dead. But it may yet help to return 
some of the lifeblood drained from the system. 
The scheme shows young scientists that it is possible to achieve 


R= of vampire fiction might hesitate to peer inside an isolated 
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uncompromising, international standards of science on Romanian 
soil. And this is no local-scale project. The students that participate 
do so only after fierce international competition for places. This year, 
just two students from Romanian institutions joined the 13 chosen 
from 122 applicants. 

Muresan and Albeanu are determined that the summer school will 
have an experimental aspect as well as a theoretical one, partly to com- 
pensate for the dearth of experimental biology in Romania. But it also 
speaks to the programme's global ‘yes we car’ philosophy. Students are, 
in part, selected for their likelihood of contributing to similar research 
when they return home — whether or not their labs are wealthy. Learn- 
ing to build expensive equipment, such as two-photon microscopes, 
which can cost hundreds of thousands of dollars, gives students the con- 
fidence to build, repair or modify whatever apparatus might be required 
to address the neuroscientific research questions they wish to pose. 

The inspiring story has spurred many scientists from leading insti- 
tutions around the world — from Harvard University in Cambridge, 
Massachusetts, to the National Centre for Biological Sciences in 
Bangalore, India — to lecture at the course. And so far, several research 
foundations and commercial companies in different countries have 
stumped up financial or in-kind support. 

TENSS will clearly continue to need such generosity in years to come. 
But the Romanian government must emulate some of the school’s lofty 
aims — and carve out a rational, meritocratic system to educate and 
support homegrown scientists and science. The TENSS experience has 
shown that talent and enthusiasm will be available, as will the required 
curiosity — in whatever form. One day during last 
year’s summer school, a villager stared mystified 
through the open door. After some thought, he 
ventured: “That's a fine-looking sewing machine 
you have there.” = 
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LEE BARRETT 


WORLD VIEW .jecninicosnen 


any people have to move jobs and homes to build their 
Me Relocation is a common disruption, sometimes 

desired and sometimes not. But how many careers demand 
that people move every few years, as science does? In how many other 
fields are promising recruits — who often already have a decade's edu- 
cation behind them — expected to uproot their families and move 
repeatedly for the best part of another decade? 

Such frequent changes of location are unsettling and detrimental 
to people’s personal lives. Yet there is a widespread expectation that 
early-career researchers should move around, to demonstrate their 
independence or work with new people. 

This attitude partly serves as an uncomfortable reminder that some 
academics view junior scientists as expendable sources of cheap labour 
whose lives and happiness are secondary con- 
siderations. But it is also outdated, reflecting the 
world in which many senior scientists developed 
their own careers: a world in which graduates 
and young researchers needed to move between 
labs and institutions to spread their knowledge 
and skills and, in doing so, keep science innova- 
tive and collaborative. 

The information-technology revolution of 
the twenty-first century has changed that. For 
many scientists in 2014, the physical location of 
a laboratory is less important than the speed of 
its Internet connection. If they wish, researchers 
can now communicate more often, and just as 
easily, with colleagues in a different time zone 
than with those in the next office. 

During my current fellowship, I have worked 
with colleagues in the United States, Germany, 
Australia, Sweden and France, many of whom 
I have never met in person. If face-to-face interaction is essential, 
budget airlines allow for multiple short visits to other labs and col- 
laborators. (I am writing this on a plane to Uppsala in Sweden for 
such a trip.) The day-to-day work of science has become similarly 
diffuse. Standardized lab equipment allows researchers to replicate 
experiments and results more easily than in the past, wherever the 
work is performed. 

For some scientists, of course, the opportunity to move around is 
wonderful. It is perfect for people with wanderlust, who lack personal 
ties or who thrive in varied surroundings and on ephemeral contracts. 

However, for many others this migration-centred system is hugely 
disruptive, and can add to the forces that squeeze talented scientists 
out of academia and into other careers. 


The ‘young’ people whom science labels as NATURE.COM 
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Uprooting researchers can 
drive them out of science 


Making early-career scientists change institutions frequently is disruptive 
and — with modern technology — unnecessary, says Russell Garwood. 


problem: It is often not possible, or wise, for them to drop everything 
and move every few years, especially if they have children. Yet making 
the best decision for their families can harm their careers. For example, 
my current fellowship is based in Manchester, UK, but my partner has 
ajob in London —a few hours away by train — and is understandably 
reluctant to leave. We have been very lucky: the terms of my fellow- 
ship mean that I have a degree of independence and can travel a lot, 
allowing us to live together. But having made the choice to reduce the 
amount of time spent at my institution, I find it hard to contribute to 
many aspects of departmental life. I worry that this might limit my 
future options. The effect is surely even greater for female scientists, 
whose careers often already suffer as a result of family obligations. 

Simply put, the career framework for young scientists was established 
at a time when wives and partners did not neces- 
sarily work and were expected to follow the — 
generally male — breadwinner as he worked his 
way up. That (thankfully) is not the world we live 
in now. Society has changed and science should 
change with it. 

Institutional policies can ease the move. In 
the United States, for instance, a number of uni- 
versities make an effort to help to find jobs for 
researchers’ partners. But relocation should not 
be necessary. In the long term, cultural change is 
required — just as it is to address, for example, 
the under-representation of women in science, 
which is exacerbated by the two-body problem. 

There are some straightforward steps that we 
can take. First, guidelines for grant reviewers, 
job panels and academics should make clear that 
personal factors are as important and legitimate 
as professional ones when it comes to making 
career choices. Instead of demanding that all young researchers move 
institutions, funding agencies could consider personal motivations 
ona case-by-case basis, just as they currently judge the strength ofan 
applicant's science. 

Second, principal investigators could ensure that young scientists 
have the chance to pursue independent research without leaving the 
lab, and to publish the results. Early-career researchers should push 
for such opportunities, and institutions should encourage and nurture 
them. For example, one afternoon a week could be set aside for early- 
career scientists to conduct self-directed research. 

Staying in one place has the potential to stifle independence. But 
that risk should be measured against the danger that the scientist will 
be forced out of research — and that ultimately, science will lose out. m 


Russell Garwood is an 1851 research fellow at the University of 
Manchester, UK. 
e-mail: russell. garwood@manchester.ac.uk 
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GLACIOLOGY 


Refrozen water 
warms glacier 


Meltwater flowing beneath 
Greenland’s glaciers refreezes 
into large ice units that could be 
distorting and even warming 
the overlying ice layers. 

Robin Bell of Columbia 
University’s Lamont- 
Doherty Earth Observatory 
in Palisades, New York, and 
her team used radar data to 
identify subglacial ice units 
across northern Greenland. 
The authors found significant 
warping of the surrounding 
layers, which they attribute to 
the refreezing meltwater below. 

Moreover, these ice units 
were found in areas of fast 
glacier flow. The authors 
suggest that energy released 
from the meltwater as it 
refreezes is warming the ice 
above, and thus speeding up 
the glacier’s march towards the 
ocean. 
Nature Geosci. http://doi.org/s7j 
(2014) 


ELECTRONICS 


Stretchy battery 
woven into fabric 


Researchers in China have 
incorporated relatively 
powerful lithium-ion wire 
batteries into textiles — a step 
towards better power sources 
for wearable electronics. 
Lithium-ion batteries in 
general are more powerful 
than current wearable energy 
storage devices, but can 
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ZOOLOGY 


How ants link up to build bridges 


Fire ants band together into rafts and bridges by 
each making an average of 14 connections with 


adjacent ants. 


The insects (Solenopsis invicta) form networks 
(pictured) to cross streams and deal with floods. 
To study the networks’ structure, David Hu and 
his team at the Georgia Institute of Technology in 
Atlanta froze clumps of ants with liquid nitrogen, 
coated them with vaporized glue and imaged 
them with a micro-computed-tomography 


short-circuit and combust if 
stretched or distorted during 
use. Huisheng Peng, Yonggang 
Wang and their team at Fudan 
University, Shanghai, overcame 
this by incorporating safer 
lithium -oxide nanoparticles 
into carbon nanotube yarns. 
These yarns, which form the 
batteries’ electrodes, were 
twisted around a piece of elastic, 
creating a stretchable structure 
that could be woven into textiles 
(pictured). 

The wire battery produced 
10 times more power per 
cubic centimetre than non- 
stretchable, thin-film lithium 
batteries and maintained 
84% of its capacity after being 
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scanner. The team found that the ants grab 

hold of each other using adhesive pads on their 
legs. The insects also tend to orient themselves 
perpendicularly to one another, with smaller ants 
slotted in between larger ones to maximize the 
number of connections between them. 

The ants could inspire the development of 
robots and smart materials that assemble into 
new structures, the authors say. 

J. Exp. Biol. 217, 2089-2100 (2014) 
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stretched 200 times. her team found a different 
Angew. Chem. http://doi.org/ pattern when they tracked 
f2r6pv (2014) more than 4,000 populations 
of the weed Plantago lanceolata 
| sCoLoGY over 12 years on the Aland 
7 Islands in the Baltic Sea. Rather 
Stick togeth er to than being protected, isolated 
i i populations were infected 
fi ght disease by the mildew Podosphaera 
Isolated plant populations are plantaginis more often than 
more vulnerable to disease weeds in dense networks. Zz 
than highly connected ones, The team then studied o 
contrary to popular thinking. samples from 22 plant 2 
Diseases are thought to populations in the lab and 


spread more quickly in dense 
populations, which facilitate 
the transfer of disease from 
one group to another. But 
Anna-Liisa Laine of the 
University of Helsinki and 


found that plants from highly 
connected populations were 
generally more disease resistant 
than their counterparts from 
fragmented populations, 
possibly because resistance 
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genes are more readily 
exchanged among populations 
located near each other. 
Science 344, 1289-1293 (2014) 


Global quantum 
clock proposed 


A set of atomic clocks linked 
together using the principles of 
quantum physics could be the 
authoritative world clock — 
more accurate and stable than 
any atomic clock today. 

Mikhail Lukin of Harvard 
University in Cambridge, 
Massachusetts, and his team 
propose combining ultra- 
precise atomic clocks using 
quantum entanglement, which 
links the quantum states of 
particles separated over large 
distances. Entangling the 
clocks would allow scientists 
to combine measurements in 
a way that reduces the overall 
noise, rendering the combined 
signal more accurate. The 
resulting space-based network 
could be used to synchronize 
timekeeping standards globally, 
the authors say. 

Building the clock will 
require technological advances, 
such as improving the stability 
of clock signals sent through 
Earth’s turbulent atmosphere. 
Nature Phys. http://doi.org/s7k 
(2014) 


Genome editing of 
stem cells 


A genome-editing system 
allows researchers to introduce 
multiple gene alterations into 
human stem cell lines. 

A team led by Danwei 
Huangfu at Memorial Sloan- 
Kettering Cancer Center in 
New York used the recently 
developed genome-editing 
systems TALEN and CRISPR- 
Cas9 to efficiently create 
human embryonic stem (ES) 
cells and induced pluripotent 
stem (iPS) cells containing 
up to three different gene 
alterations. The researchers 
used their approach to 
introduce various mutations 
linked to Alzheimer’s disease 


into iPS cells, as well as to delete 
certain genes in pancreatic cells 
derived from ES cells. 

The method should make it 
faster and easier to determine 
the effects of disease-related 
gene changes on cell and tissue 
development, the authors say. 
Cell Stem Cell http://doi.org/s6w 
(2014) 


PARTICLE PHYSICS 


Exotic four-quark 
particle confirmed 


A team have confirmed the 
existence ofa four-quark 
particle, named Z(4430). The 
finding, together with other 
exotic particles, challenges the 
idea that quarks only combine 
in pairs (mesons) or triplets. 

Z(4430) was first spotted 
in 2008 at the Belle detector 
in Japan, but another detector 
in California failed to see it, 
casting doubt on the initial 
observations. A team working 
on the LHCb experiment 
at CERN, Europe's particle 
physics laboratory near Geneva 
in Switzerland, analysed 
about a billion high-energy 
proton-proton collisions. The 
scientists noticed that in about 
4,000 cases there was a highly 
significant Z(4430) signal — 
about 14 standard deviations 
above background levels. 

The authors determined that 
the particle is composed of four 
quarks because of its observed 
decay patterns, and is not an 
artefact of interactions between 
ordinary two-quark mesons. 
Phys. Rev. Lett. 112,222002 (2014) 


REGENERATIVE BIOLOGY 


Love hormone 
revitalizes muscles 


The hormone involved in 
social bonding also enables old 
muscles to rejuvenate. 

Wendy Cousin, Irina 
Conboy and their colleagues 
at the University of California, 
Berkeley, injected the hormone 
oxytocin into old mice, and 
found that after an injury the 
muscles in these animals had 
similar regeneration levels to 
muscles in young mice. The 
hormone improves repair by 
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SOCIAL SELECTION 


Popular articles 
on social media 


Lab animals spark debate 


Social media is hosting the latest round of the debate over 
medical studies involving animals. Writing in the British 
Medical Journal, Yale University epidemiologist Michael 
Bracken and UK medical sociologist Pandora Pound argued 
that too many animal trials investigating medical treatments 
are poorly designed, and called for better use of systematic 
reviews to maximize their benefit. Lenny Verkooijen, a clinical 
epidemiologist at University Medical Center Utrecht in the 
Netherlands, tweeted that there is “insufficient systematic 
evidence for the clinical benefits of animal research” But ina 
letter to the journal, pharmacologist Fernando Martins do Vale 
at the University of Lisbon noted that animal research has 
benefited medicine and has led to “seminal discoveries in the 
field of physiology, biochemistry, pharmacology and genetics”. 
Pound, P. & Bracken, M. B. Br. Med. J. 348, g3387 (2014) 


Based on data from altmetric.com. 
Altmetric is supported by Macmillan 
Science and Education, which owns 
Nature Publishing Group. 


activating a signalling pathway 
in muscle stem cells — thereby 
boosting the cells’ proliferation. 
Moreover, mice engineered 

to lack oxytocin showed 
decreased muscle repair and 
greater loss of muscle tissue 
compared with normal mice of 
the same age. 

Oxytocin could be used as a 
drug to prevent or slow down 
muscle ageing, the authors say. 
Nature Commun. 5, 4082 (2014) 


ANIMAL BEHAVIOUR 


Apes cooperate on 
their own 


Without any prior training, 
captive chimpanzees team up 
on a task, suggesting that the 
primates are more cooperative 
than previously thought. 
Malini Suchak, now at 
Canisius College in Buffalo, 
New York, and her colleagues 
designed a device that required 
one or two chimps (Pan 
troglodytes) to remove a barrier 
in order for another individual 
to simultaneously obtain a tray 
of food. The researchers placed 
the device in a large enclosure 
in which 11 chimps lived, 
and found that the animals 
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spontaneously worked together 
in groups of two or three to 
complete the task (pictured) 
more than 3,000 times — an 
average of 38 per one-hour 
session. Unlike most previous 
studies, the apes were free to 
choose their own partners, 
which could have allowed them 
to avoid competitors that might 
impede cooperation. Complex 
cooperative behaviour is not 

a uniquely human trait, the 
authors suggest. 

PeerJ 2,e417 (2014) 
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SEVEN DAYS escnnss 


Urologist sentenced 


Austrian urologist Hannes 
Strasser will have to serve 

a two-year sentence for 
crimes relating to his use 

of an unauthorized stem- 
cell therapy to treat urinary 
incontinence, an appeals court 
confirmed on 11 June. The 
treatment was ineffective in 
many patients, and harmed 
others. The scandal was 
exposed six years ago, when 
Strasser was a professor at 
the Medical University of 
Innsbruck (see Nature 454, 
922-923; 2008). The university 
was not allowed to dismiss 
him because ofa ruling 

from a national employment 
committee, but the court 
judgment means that he now 
has to be dismissed. 


Drug data freed 


The European Medicines 
Agency has agreed in principle 
to publish clinical-trial reports 
on any drug that receives 
marketing approval in the 
European Union. It announced 
the move on 12 June and is the 
first major drug regulatory 
agency to take such a step. The 
agency says that the shift will 
improve transparency in the 
medicine approval process and 
make it easier for academics 

to conduct non-commercial 
research. See go.nature.com/ 
obsicm for more. 


Chile axes dam plan 
The Chilean government said 
on 10 June that it had rejected 
plans for a controversial 
hydroelectric project in 
southern Patagonia. The 
2,750-megawatt HidroAysén 
project won partial 
government approval in 2011, 
but was dogged by concerns 
about the environmental and 
social effects of building five 
new dams on Patagonian 


Ivory poaching continues apace in Africa 


More than 20,000 African elephants were 
poached across the continent last year, finds 

a report by the Convention on International 
Trade in Endangered Species of Wild Fauna 
and Flora (CITES). Published on 13 June, 

the report uses the latest figures from CITES 
programmes that monitor poaching. Overall 
poaching numbers were lower in 2013 than in 
the previous two years, but continue at levels 


rivers. The government's 
latest move may not spell the 
end for the dam, however. Its 
backers, power companies 
Endesa Chile and Colbuin, 
could revise their plans and 
seek permission to go forward 
with an updated version of the 
project, media reports say. 


TB drug too costly 


The high cost of treating a 
form of tuberculosis that 

is resistant to many drugs 
could allow the disease to 
spread, physicians warn. 
An effective new drug, 
bedaquiline, is available, 
but costs up to US$30,000 
for one course of treatment. 
On 12 June Caitlin Reed of 
the Olive View- University 
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of California, Los Angeles, 
Medical Center told the 2014 
National TB Conference in 
Atlanta, Georgia, that the drug 
would be too expensive. Reed 
is currently using bedaquiline 
to treat a patient who, she said, 
has the most drug-resistant 
form of tuberculosis ever seen 
in the United States. 


Biofuel cap 

By 2020, biofuels made from 
food crops should be limited 
to providing only 7% of all 
transport fuel in the European 
Union, European ministers 
agreed on 13 June. The limit 
is more generous than the 5% 
cap originally proposed by 
the European Commission, 
but awaits a vote from the 
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that will exacerbate the decline of the African 
elephant population. The report also shows 

a rise in seizures of large ivory shipments 
(weighing more than 500 kilograms). For the 
first time, more large seizures were made in 
Africa than in Asia, 80% of which occurred in 
Kenya, Tanzania and Uganda. The results will 
be discussed at a CITES meeting in Geneva, 
Switzerland, on 7-11 July. 


newly elected European 
Parliament. Scientists have 
long warned that fuels such 
as biodiesel made from palm 
oil can produce more carbon 
emissions than the fossil fuels 
they replace (see Nature 499, 
13-14; 2013). 


Integrity audit 
Scientists in Ireland should 
expect their research processes 
to be audited by outside 
consultants, according to 
plans outlined by Mark 
Ferguson, director-general 

of the basic-research funding 
agency Science Foundation 
Ireland. Ferguson told Nature 
that the aim of the audit is to 
ensure that work funded by 
the agency is being conducted 
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with integrity. He hopes that 
the first annual audits will 
begin by the end of the year. 
See page 325 for more. 


EVENTS 


Park oil-drilling axed 
Oil company SOCO 
International is suspending 
explorations in Virunga 
National Park in the 
Democratic Republic of the 
Congo, it announced on 

11 June. Virunga is Africa’s 
oldest national park and 

is home to the critically 
endangered mountain gorilla 
(Gorilla beringei beringei). 

The move came after 
conservation groups led by 

the WWF filed a complaint 
with the Organisation for 
Economic Co-operation and 
Development. SOCO, which 

is based in London, has agreed 
not to drill in the park or in any 
other site given World Heritage 
status by the United Nations. 


RIKEN report 


An independent committee 
has recommended that 

the RIKEN Centre for 
Developmental Biology 

in Kobe, Japan, should 
close because of its role 

in the publication of two 
problematic papers in Nature. 
The research in question 
purported to describe a 
new method for generating 


TREND WATCH 


China’s coal-intensive electricity 
grid means that making a silicon 


solar panel there — although 
cheaper — leaves a carbon 
footprint almost twice as large 
as that from making one in 


Europe, according to a study led 
by Fengqi You at Northwestern 
University in Evanston, Illinois 
(D. Yue, F. You and S. B. Darling 
Sol. Energy 105, 669-678; 2014). 
But emissions per kilowatt-hour 
of electricity produced by even 
the ‘dirtiest-made’ solar panel are 
some 16 times lower than those 
from a typical coal plant. 


embryonic stem cells, but the 
papers were found to include 
duplicated images, among 
other problems. On 12 June, 

a committee looking into 
research misconduct told a 
press conference (pictured, 
with committee head Teruo 
Kishi speaking) that it had 
found structural flaws in the 
running of the Kobe centre and 
called for it to be dismantled. 
RIKEN is planning structural 
reforms and intends to appeal 
against the judgement. 


Cancer trial 


A groundbreaking clinical 
trial in lung cancer began 
enrolling patients on 16 June. 
The five-year Lung Cancer 
Master Protocol trial will 
assign up to 1,000 patients 
per year to receive one 

of five experimental 
treatments, depending on 
the genetic mutations in their 
tumours (see Nature 498, 
146-147; 2013). It unites five 


pharmaceutical companies, 
and will be led by the SWOG 
Cancer Research consortium 
in Portland, Oregon, and 
administered by the US 
National Cancer Institute. It is 
intended to serve as a model 
for how clinical trials can be 
streamlined and personalized. 


Hubble search 

The Hubble Space Telescope 
has begun searching for an icy 
world in the outer Solar System 
that NASAs New Horizons 
mission can visit after its 

fly-by of Pluto in July 2015. 
The search was announced 

on 16 June by the NASA 
comunittee that allocates 
observing time on Hubble. 
Mission scientists were unable 
to identify a suitable candidate 
in the Kuiper belt using 
ground-based telescopes, 

and they hope that Hubble's 
vantage point will give them a 
better view. See go.nature.com/ 
nayaec for more. 


H7N9 predictions 


Researchers have developed a 
model that accurately predicts 
which live-poultry markets are 
at risk of becoming infected 
with the H7N9 avian influenza 
virus that has swept across 
China. Most human cases 

of the virus have occurred 
through exposure at such 
markets. The research team 
conducted a census of 8,943 
live-poultry markets in China, 
and found that local density 
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The carbon dioxide emissions created when photovoltaic (PV) solar 
panels are made in China are twice as high as for those made in Europe. 
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SEVEN DAYS | THIS WEEK | 


21-26 JUNE 

The Euroscience 

Open Forum meets in 
Copenhagen to discuss 
the future direction of 
research and science 
policy in Europe. 
esof2014.0rg/info 


23-27 JUNE 

The first meeting of 

the United Nations 
Environment Assembly 
takes place in Nairobi. 
Discussions will 
include the sustainable 
development goals that 
aim to reduce global 


poverty. 
www.unep.org/unea/en 


is the most important factor 
in predicting the risk of 
outbreaks (M. Gilbert et al. 
Nature Commun. 5, 4116; 
2014). The findings should 
help authorities to develop 
better control measures. 


Child-study hold-up 
AUS study of 100,000 
children that was authorized 
by Congress 14 years ago 

may face further delays. An 
external review released on 

16 June found that planning 
for the National Children’s 
Study lacked proper scientific 
input. See page 323 for more. 


| BUSINESS 
Tesla opens patents 


Electric-car company Tesla 
Motors has announced that 

it will let other firms use the 
technology it has patented. 
The company, headquartered 
in Palo Alto, California, said 
on 12 June that it would 

not initiate patent lawsuits, 
apparently in an effort to 
promote growth in the market 
for electric vehicles and to 
encourage common standards 
for supporting infrastructure, 
such as battery chargers. 
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The terracotta army, which consists of individually sculpted warriors, was found inside the mausoleum of China’s first emperor, Qin Shi Huang. 


ARCHAEOLOGY 


3D images remodel history 


Digital-photo software promises to offer unprecedented access to artefacts and sites. 


BY EWEN CALLAWAY 


decades to create China's terracotta army, 
but digital avatars made in minutes could 
solve the lingering mystery of one of the 
country’s most famous relics. By creating three- 
dimensional (3D) models of the 2,200-year-old 
collection of statues, archaeologists hope to 
confirm whether the soldiers were intended 
to represent a real army of distinct individuals. 
Known broadly as computer vision, the 


I took hundreds of thousands of workers 


technology was developed to enable machines 
such as factory robots and the Mars rovers to 
map a 3D world from camera images. But now 
it is quietly revolutionizing archaeology and 
palaeontology, allowing virtual bones, artefacts 
and whole excavation sites to be shared and 
studied without risk of damage. 

“In the future, it’s highly likely that these 
sorts of methods will be the standard thing 
you do to record an archaeological site,” says 
Andrew Bevan, an archaeologist at University 
College London, who is part of a team using 
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computer vision to build digital models of the 
terracotta army’s life-size warriors. 

Since the army was discovered in 1974 in 
an emperor’s mausoleum near Xian, histori- 
ans have debated whether the soldiers’ facial 
details were modelled on actual militiamen. 
“Are the warriors portraits of individual peo- 
ple? Or are they a ‘Mr Potato Head’ approach 
to individualism, where you slap on different 
noses and moustaches and ears?” Bevan says. 

Computer-vision models might offer the 
answer, Bevan suggests. Digital photos 
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> can be taken quickly, cheaply and without 
disturbing the statues. Several dozen high- 
quality photos of a soldier, taken from multiple 
perspectives, can provide a computer algo- 
rithm with enough data to determine where 
each image was taken from and create a 3D 
map ina few minutes. The model — a set of x, 
y and z coordinates — can be plotted against 
other models, analysed and even used to make 
a cast with a 3D printer. 

Ina pilot study published on 4 June, Bevan’s 
team modelled the faces of 30 warriors and 
found that no two ears were identical — evidence 
that the army consists of individuals (A. Bevan 
et al. J. Archaeol. Sci. http://doi.org/s7v; 2014). 
The researchers compared ears because these 
are unique and may have been modelled on 
real people. But they plan to analyse other 
anatomical features 


to see whether the “Wecanexpect 
soldiers vary in toseeentire 
ethnicity or bear the collections of 
hallmarks of distinct hundreds of 
craftsmen. Bevan thousands of 
stresses that the work objects digitally 
is at an early stage. available.” 
Archaeologists and 


palaeontologists have used computer model- 
ling for decades, to map digs with laser 
scanners or study bones with computed 
tomography (CT), for example. But propo- 
nents of computer vision argue that these 
technologies are costly and not made for 
routine use in the field. 

“You're talking about having a camera 
versus having a £30,000 [US$50,000] piece 
of kit ready,” says Sarah Duffy, an archaeolo- 
gist at the University of York, UK. When 


Superimposed 3D models (one in green, the other 
in white) reveal minute differences in ear shape. 


900,000-year-old footprints were found on 
eastern England’s Norfolk coast last year, she 
was part of a team that raced to photograph 
the scene and capture the footprints in 3D. The 
resulting model revealed that they had been 
left by a human ancestor — the oldest such 
relics discovered outside Africa (N. Ashton 
et al. PLoS ONE http://doi.org/rd2; 2014). The 
prints had nearly vanished by the time the 
researchers lugged a laser scanner to the site 
a week later. 

Benjamin Ducke at the German Archaeo- 
logical Institute in Berlin agrees that the 
technology has the potential to preserve sites 
that are disappearing. Last October, he used a 
drone equipped with a video camera to create 
a 3D map ofa large pre-Columbian settlement 
in Mexico ina couple of days. His team, called 
Project Archaeocopter, plans to analyse sites 
in Uzbekistan and at Pompeii in Italy. With 
an infrared camera mounted on a drone, the 
technology could map archaeological sites 
obscured by dense forests, he says. 

Powerful computer-vision software is 
affordable and readily available, but advocates 
such as Heinrich Mallison, a palaeontologist 


at Berlin’s Natural History Museum, see the 
technology as more than a time and money 
saver. “It means we can expect to see entire col- 
lections of hundreds of thousands of objects 
digitally available in a decade, so everybody can 
use these for research,’ he says. Ducke thinks 
that the technology has the potential to break 
the “interpretative monopoly” of scholars 
whose theories prevail because others lack 
access to particular artefacts or remains. 

Jean-Jacques Hublin, a palaeoanthropolo- 
gist at the Max Planck Institute for Evolution- 
ary Anthropology in Leipzig, Germany, expects 
museums to limit the creation and distribution 
of such models in their collections, in the same 
way as some have done for CT scans. Museums 
worry about losing control over their collec- 
tions, but Hublin thinks that demand among 
scientists will inevitably push more collec- 
tions online. With computer-vision technol- 
ogy in mind, in May the European Union 
began accepting applications for a €14-million 
($19-million) fund to create 3D models of 
examples of Europe's cultural heritage. 

But data theft is a worry, Mallison says. 
“I can go to a museum in Beijing, pull out 
my Canon, play tourist and do research on 
a high-resolution 3D model of their fossils.” 
Academics might not risk the backlash of 
collecting data without permission, but replica 
sellers could pillage museum collections with 
computer-vision software, says Mallison. He 
thinks that international rules are needed to 
prevent this. Nevertheless, he predicts that it 
is only a matter of time before 3D models of 
museum collections are widely available. “The 
question is, do we see it in 5 years or 10 years or 
15 years?” he says. = 


A. BEVAN ET AL. J. ARCHAEOL. SCI. HTTP://DOILORG/S7V (2014)/CC-BY 


Tree hitched a ride to island 


Acacia analysis reveals globetrotting seed trekked 18,000 kilometres from Hawaii to Réunion. 


BY EMMA MARRIS 


dispersal event ever recorded, researchers 

have shown using genetic analysis that an 
acacia tree endemic to Réunion Island in the 
Indian Ocean is directly descended from a 
common Hawaiian tree known as the koa. In 
fact, these two trees on small specks of land on 
opposite sides of the globe turn out to be the 
same species. 

The event is remarkable not just for the sheer 
distance covered — some 18,000 kilometres, 
almost the farthest apart that any two points 
on land can be — but that it occurred between 
two small islands. Koa seeds are unlikely to have 


I: what is probably the farthest single 
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floated to Réunion — they will not germinate 
after being soaked in seawater, and the trees 
grow in the mountains, not near the shore. The 
researchers, led by Johannes Le Roux, a molecu- 
lar ecologist at Stellenbosch University in Matie- 
land, South Africa, propose in a study published 
this week that a sea bird brought a seed from 
Hawaii to Réunion in its stomach or stuck to 
its feet in a one-off event some 1.4 million years 
ago (J. J. Le Roux et al. New Phytol. http://dx.doi. 
org/10.1111/nph.12900; 2014). 

Le Roux notes that the physical similarities 
between the two trees, Acacia heterophylla 
from Réunion and Acacia koa from Hawaii, 
have been known for decades. “To me the 
most exciting thing is that we have solved this 
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riddle, he says. “And how improbable is it?” 

Le Roux and his team sequenced the DNA 
from 88 trees, including A. heterophylla, A. koa 
and a closely related acacia species from Aus- 
tralia, where the family originated. They found 
that all the acacias on Réunion share a genetic 
signature that is just one mutational step away 
from that of some Hawaiian koas. Using the 
slight differences between the trees’ sequences, 
they developed a family tree, which clearly 
showed that all A. heterophylla are more closely 
related to one type of Hawaiian koa than some 
other types of koa are to each other. 

To work out when the dispersal event took 
place, the team used a ‘molecular clock. This 
counts up genetic changes between populations 
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technology has the potential to preserve sites 
that are disappearing. Last October, he used a 
drone equipped with a video camera to create 
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the “interpretative monopoly” of scholars 
whose theories prevail because others lack 
access to particular artefacts or remains. 
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museums to limit the creation and distribution 
of such models in their collections, in the same 
way as some have done for CT scans. Museums 
worry about losing control over their collec- 
tions, but Hublin thinks that demand among 
scientists will inevitably push more collec- 
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floated to Réunion — they will not germinate 
after being soaked in seawater, and the trees 
grow in the mountains, not near the shore. The 
researchers, led by Johannes Le Roux, a molecu- 
lar ecologist at Stellenbosch University in Matie- 
land, South Africa, propose in a study published 
this week that a sea bird brought a seed from 
Hawaii to Réunion in its stomach or stuck to 
its feet in a one-off event some 1.4 million years 
ago (J. J. Le Roux et al. New Phytol. http://dx.doi. 
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Le Roux notes that the physical similarities 
between the two trees, Acacia heterophylla 
from Réunion and Acacia koa from Hawaii, 
have been known for decades. “To me the 
most exciting thing is that we have solved this 
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riddle, he says. “And how improbable is it?” 

Le Roux and his team sequenced the DNA 
from 88 trees, including A. heterophylla, A. koa 
and a closely related acacia species from Aus- 
tralia, where the family originated. They found 
that all the acacias on Réunion share a genetic 
signature that is just one mutational step away 
from that of some Hawaiian koas. Using the 
slight differences between the trees’ sequences, 
they developed a family tree, which clearly 
showed that all A. heterophylla are more closely 
related to one type of Hawaiian koa than some 
other types of koa are to each other. 

To work out when the dispersal event took 
place, the team used a ‘molecular clock. This 
counts up genetic changes between populations 


DAVE RICHARDSON 


and uses an estimated mutation rate to derive 
the date that populations first split. The team 
knew that the koa tree originally came from 
Australia, and that the earliest point at which 
it could have become established on Hawaii 
was when Kauai, one of the older Hawaiian 
islands with the high elevations that koas pre- 
fer, formed 5.1 million years ago. Comparison 
of the Hawaiian koa and the trees on Réunion 
then showed that mutations that occurred in 
the subsequent 3.7 million years were present in 
both lineages. But mutations that occurred after 
that were found in either the Réunion trees or 
the Hawaiian trees, but not in both; this genetic 
divergence suggests that the dispersal event 
took place 1.4 million years ago. 

Le Roux has ruled out the possibility of 
humans transferring the seed, because the 
molecular clock suggests that genetic changes 
began long before humans arrived in Réunion. 
“Despite its close genetic relatedness to koas 
from Hawaii, you see there is already diver- 
sification that is unique to Réunion,” he says. 

The startling finding is the latest in a string 
of improbable long-distance dispersal events 
that have been uncovered in the past 15 years 
or so. These include the proposed movement 
of New World (flat-nosed) monkeys ona raft 
from Africa to South America less than 50 mil- 
lion years ago, long after the two continents 
split; and the transfer of sundew carnivorous 
plants (Drosera species) from western Aus- 
tralia to Venezuela, probably by birds (see ‘Far 
and wide’). Such findings have shaken up the 
field of biogeography, which concerns itself 
with why species are found where they are. 

In the past, similar species found on differ- 
ent land masses were presumed to be the result 
of the continents slowly drifting apart, says 
Alan de Queiroz, an evolutionary biologist at 
the University of Nevada, Reno, and author 
of The Monkey's Voyage (Basic, 2014), a book 
about long-distance dispersal. And islands were 
thought to be largely dead ends when it came to 
species dispersal. “Things don't go from islands,” 
he says, “or at least that was the general thought.” 

But the newly discovered long-distance 
events are changing that opinion, and 


FAR AND WIDE i x 


Ecologists have now shown that these acacia trees on Réunion are the same species as those on Hawaii. 


biogeographers are increasingly stressing the 
role of improbable events and serendipity 
in shaping which species occur where. “The 
event [of the koa dispersal] is a giant fluke, but 
that’s part of the message of a lot of recent bio- 
geographic studies: that giant flukes happen,” 
de Queiroz says. 

As these accounts of long-distance dispersal 
accumulate, some ecologists say that the next 
challenge is to make predictive generalizations 


_ Several species are thought to have colonized areas far from their place of origin 


as a result of long-distance dispersal. « 
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about how often such events occur and which 
mechanisms (such as bird dispersal or rafting) 
are most important. “What we need to do is 
go beyond this accumulation of anecdotal evi- 
dence,” says Ran Nathan, a movement ecolo- 
gist at the Hebrew University of Jerusalem. 

But the problem is that the rarity and acci- 
dental nature of such events may defy catego- 
rization. “There could be an argument that 
you get an endless list of very, very strange and 
peculiar mechanisms,” says Nathan. “There 
will bea long list, but there will be some mech- 
anisms that are much more frequent.” 

Ecologist Jon Waters of the University of 
Otago in Dunedin, New Zealand, says that 
despite the potentially large role of long- 
distance dispersals in organizing global flora 
and fauna, such dispersals are not completely 
random or unpredictable. “As well as thinking 
about geographic proximity in making predic- 
tions about dispersal, there are numerous other 
factors to consider, such as oceanographic con- 
nectivity patterns, prevailing winds, storm 
tracks and even bird migrations,” he says. 

In other words, the distribution of some 
species may be the result of chance, time and 
luck — but there are still patterns. And science 
still has a part to play in elucidating them. = 
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US child study hits buffers 


Launch date for cohort study set to be delayed as data problems are identified. 


BY HEIDI LEDFORD 


had high hopes for the National Children’s 

Study (NCS). It would track 100,000 chil- 
dren from birth to age 21, provide a wealth of 
data about environmental effects on health and 
yield a greater understanding of health dispari- 
ties between different ethnicities and income 
levels. It might even reveal links between expo- 
sures and conditions such as asthma, autism and 
attention deficit hyperactivity disorder, which 
are increasingly common in children. 

But 14 years after planning began, with more 
than US$1 billion spent and 5,050 children 
enrolled in a pilot phase, the study still lacks 
the scientific grounding it would need to be 
fully implemented next year as scheduled, a 
review by the US National Research Council 
has found. 

“The study has great promise,’ says Greg 
Duncan, an economist at the University of 
California, Irvine, and chair of the review 
committee. “But we did identify a number of 
problems that need to be addressed” 


| ike proud new parents, US researchers 


HISTORY OF SETBACKS 

The council’s report, released on 16 June, is 
the latest blow to a study mired in contro- 
versy. During the 2000s, the administration 
of then-president George W. Bush repeatedly 
attempted to cancel the NCS, only for Con- 
gress to restore its funding. In 2012, the study 
was scaled back in the face of projections that it 
would cost more than twice the initial estimate 
of $3 billion over 25 years. 

The latest report finds that the study’s proto- 
cols for data collection have not yet been final- 
ized or tested, and administrators failed to back 
up important decisions with scientific docu- 
mentation. The review panel also says that the 
scientific hypotheses used to guide study design 
were poorly defined. “The hypotheses were just 
silly,” says Nigel Paneth, an epidemiologist at 
Michigan State University in East Lansing, who 


The wait for the study’s launch has been prolonged. 


was involved with the study before it was scaled 
back. “They bore no relationship to any public- 
health goal that I could recognize” 

The panel traces many of the study’s prob- 
lems to a lack of expertise in the programme's 
management office at the US National Institute 
of Child Health and Human Development. 
The authors note that the office does not seem 
to have incorporated feedback from scientists 
on key decisions, and they highlight a series 
of ensuing concerns, including an insufficient 
model for comparing the effectiveness of dif- 
ferent study designs. “The panel is concerned 
that the Program Office may not have suffi- 
cient in-house expertise in relevant scientific 
and survey research disciplines to enable it to 
function effectively,” the committee writes. 

Funding problems and design issues have 


plagued the study since it was authorized by 
the Children’s Health Act of 2000. When the 
pilot study suggested that the initial strategy of 
going door to door to enlist participants would 
be too expensive and slow, study planners 
began recruiting volunteers through group 
health-care providers. That raised concerns 
that the study would exclude rural areas not 
served by such groups. The drive to cut costs 
also led the programme to contract out data 
collection to private consulting groups instead 
of academic investigators. 


UNDER REVIEW 

In March 2013, amid concerns about the effects 
of these changes, Congress requested that the 
National Research Council and the Institute of 
Medicine review the NCS and withhold pay- 
ment on contracts related to the study until 60 
days after the review was completed. 

The resulting report raises valid issues, says 
Francis Collins, director of the US National 
Institutes of Health: “They had substantive 
concerns about the study design and oversight 
and we ought to take that seriously.’ 

Collins plans to convene a panel of experts 
to assess the study’s next steps and to gauge 
whether it is time, given the project's long his- 
tory, to update its design to incorporate techno- 
logical advances in electronic medical records 
and ways to assess environmental exposures. 
“If were going to be doing this for 21 years, let's 
makes sure were making the very best use of 
everything that’s available to us,” he says. 

Duncan declines to speculate on how long 
it will take the study’s organizers to incorpo- 
rate the committee’s recommendations, which 
include soliciting input from outside research- 
ers and incorporating a scientific review-and- 
approval process. But given the information 
provided to the review committee, he says that 
the study was already unlikely to start on time. 
“We expected to see a lot of completed proto- 
cols for sampling and early data collection,’ he 
says. “We didn't.” = 
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Women in sub-Saharan Africa are often not tested for HIV until they become pregnant. 


HIV trial attacked 


Critics question ethics of allowing pregnant women to receive 
treatment that falls below the standard in their country. 


BY ERIKA CHECK HAYDEN 


reatment of people with HIV has 

| advanced so much that some doctors 

and activists are urging the US National 

Institute of Allergy and Infectious Diseases 

(NIAID) to stop a trial that compares how well 

older and newer protocols keep mothers from 
passing HIV on to their newborn babies. 

The Promoting Maternal-Infant Survival 
Everywhere (PROMISE) study is comparing 
three ways of delivering antiretroviral drugs 
to pregnant women. In Option B+, women 
receive a three-drug cocktail called highly 
active antiretroviral therapy (HAART) and 
stay on it indefinitely. In Option A, the women 
take a single drug during pregnancy and in 
Option B, they take the triple-drug therapy, 
but stop shortly after delivery or finishing 
breastfeeding as long as their immune-cell 
counts have not dropped to unhealthy levels. 

Mary Glenn Fowler, the leader of the trial 
and a physician at the Johns Hopkins Bloomb- 
erg School of Public Health in Baltimore, Mar- 
yland, says that the study will provide crucial 
evidence about whether exposing pregnant 
women to the aggressive antiviral treatment 


324 | NATURE | VOL 510 | 19 JUNE 2014 


used in options B and B+ puts them and their 
newborns at unnecessary risk. “It is critical to 
be sure we're doing no harm,’ she says. 

But last year, the World Health Organi- 
zation (WHO) recommended that, where 
feasible, all pregnant women with HIV 
receive Option B+. “The PROMISE trial has 
become almost redundant,” says physician 
Erik Schouten, who works for the non-profit 
advisory organization 


Management Sciences “The PROMISE 
for Health in Malawi, trialhas 
one of 15 countriesin become almost 


which the trial is being 
conducted. 

Yet in December, leaders of the group 
conducting the study — known as the Inter- 
national Maternal Pediatric Adolescent AIDS 
Clinical Trials network — decided to continue 
PROMISE but stop other trials in the face of 
a 32% budget cut. And in May, an independ- 
ent review board said that the trial’s output so 
far justified its continuation. “We're kind of 
in between a rock and a hard place,’ says Carl 
Dieffenbach, director of the NIAID’s Division 
of AIDS. 

Option B+ has not been shown in a large 


redundant.” 
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clinical trial to protect infants any better than 
Option B, acknowledges Jennifer Cohn, medi- 
cal director of the Médicins Sans Frontiéres 
Access Campaign in Geneva, Switzerland. 
But “the movement is towards treatment sim- 
plification’, she says, and Option B requires 
choices to be made about when treatment 
should be restarted. 

In places where lab facilities are scarce and 
women have an average of five to six children, 
doctors often find it difficult to follow proto- 
cols that require regular immune-cell testing 
and resumption of treatment if levels drop or 
the woman becomes pregnant again. Partly 
for that reason, countries such as Uganda, 
Malawi, Tanzania and Zambia have now 
opted for the simplicity of Option B+. But 
this means that some women who enrol in 
the PROMISE trial will receive less-aggres- 
sive treatment than is recommended in their 
countries. 


INFORMED CONSENT 

“Tt is inconceivable that properly informed 
HIV-infected pregnant women would accept 
enrolment in a study where they might receive 
treatment that is inferior to that offered by 
their national ministry of health,’ wrote pae- 
diatrician Arthur Ammann, founder of Global 
Strategies, a capacity-building organization 
based in Albany, California, in a 7 June letter to 
officials at the US National Institutes of Health 
(NIH) in Bethesda, Maryland. 

However, Fowler says that the ministries of 
health at all PROMISE sites agreed to continue 
the trial after the WHO revised its guidelines, 
with the exception of Tanzania, which is still 
considering the decision. 

A 2012 review of PROMISE’s design 
commissioned by the NIH concluded that it 
was ethical to continue testing older regimens 
in countries that now use Option B+ because 
there was no evidence at the time that Option 
B+ was any better than those regimens. 

But public-health officials believe that 
because it provides greater coverage, Option 
B+ is more effective at protecting infants than 
the other treatments. According to statistics 
from the United Nations, the number of HIV 
infections in children up to 14 years old halved 
between 2009 and 2012 in Malawi and Zambia, 
and during that period, Option B+ was widely 
adopted in both countries. 

Dieffenbach says that no one aside from 
the independent review board has seen the 
PROMISE data, and that the board has repeat- 
edly said that the trial continues to provide 
useful information. He also says that any plan 
to end the trial would have to ensure that the 
women and children who were enrolled would 
continue to receive care. 

“These are the kinds of things that need to be 
thought out,’ Dieffenbach says, “such as how 
you would change this trial in a way that is both 
useful for research and deals in a respectful and 
ethical way with these women.” m 
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Irish university labs 
face external audits 


Funding agency aims to affirm best practice with 
independent checks on research methods. 


BY RICHARD VAN NOORDEN 


up their lab notebooks and explain their 

research processes to outside auditors, 
according to plans outlined by the head of the 
country’s basic-research funding agency. 

Mark Ferguson, the director-general of 
Science Foundation Ireland (SFI), says that he 
has invited independent consultancy firms to 
bid for an unusual and unique annual auditing 
exercise to start before the end of this year. The 
firms will check whether SFI-funded institu- 
tions, including all of Ireland’s leading univer- 
sities, have procedures in place for reporting 
and investigating misconduct; whether man- 
agement has followed those procedures in real 
cases; and whether any investigations have been 
carried out to a satisfactory standard. 

For a small, random selection of SFI-funded 
grants, auditors will also check how experi- 
mental details have been recorded in lab 
notebooks and signed off by supervisors. In 
addition, they might ask to see the data behind 
particular papers. 

“I don't want to cast us in the role of Big 
Brother. I want this to be constructive, polite 
and educative,” Ferguson says. But he notes 
that checking that research is conducted with 
integrity is just as important as running rou- 
tine financial audits. “It’s to pick up mistakes 
and promulgate best practice. We are all in the 
business of making sure we are getting the best 
research for our money,’ he adds. 

The plans follow the release on 4 June of 
the National Policy Statement on Ensuring 
Research Integrity in Ireland (go.nature.com/ 
Ixvreq), which outlines common standards for 
Irish research. It was signed by major research 
institutions and funders, and is closely modelled 
on similar European and British agreements. 

But whereas funders in other nations ask 
research organizations to assure them only that 
they are following the rules — and in some cases 
to report data on matters such as misconduct 
investigations — Ferguson says that annual 
external audits are needed to check compliance 
and to maintain public trust that money is well 
spent. “We have the right to withdraw a grant if 
there is serious mishandling,” he says, although 
he does not expect to see anything beyond 
minor issues. 


cientists in Ireland will soon have to open 


But some researchers question whether the 
audits are worthwhile. “This is an interesting 
idea but Iam not sure it will really work. It may 
serve the profile of funders to say superficially 
that “We have done our job well; but I doubt that 
the sample of audited work will be large enough 
and in depth enough to make any material dif- 
ference; says John Ioannidis, a physician who 
studies research methodology at Stanford Uni- 
versity in California. “It may add only another 
layer of bureaucratic 
checks.” 

The US National 
Institutes of Health in 
Bethesda, Maryland, 
a major funder of 
biomedical research, 
takes a similar view. 
Although it has rules 
that organizations 
must abide by, it is 
not considering exter- 
nal audits, says Sally 


“Idon’t want Rockey, the deputy 
tocastusin director for extra- 
the role of mural research at the 
Big Brother. I agency. Instead, it is 
want this to be focusing on enhanc- 
constructive, ing researcher train- 
polite and ing to improve the 


educative.” 
Mark Ferguson 


reproducibility of 
research. 

Ferguson admits 
that exhaustive audits would not be cost-effec- 
tive for the SFI, an agency that gives out only 
around €150 million (US$203 million) a year 
in grants. Auditors will not have a deep scien- 
tific knowledge or be expected to reinvestigate 
misconduct cases, so the auditing will be “fairly 
procedural’, he says. The SFI would publish 
the broad findings of its audits, but not the fine 
details, which would be shared only with uni- 
versities. “I don’t want to stigmatize particular 
grant holders,’ Ferguson says. 

Beyond early discussions, he has not talked 
through the plans in detail with research insti- 
tutions. Nature’s news team asked four universi- 
ties for comment; they directed a reply through 
the Irish Universities Association. “We look 
forward to further discussing with SFI how the 
practical implementation of the policy may be 
best effected,” a spokesperson said. m= 
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Researchers are exploring unconventional sources 
of fresh water to quench the globe’s growing thirst. 


BY QUIRIN SCHIERMEIER 


nan effort to combat his country’s long-standing water crisis, lran’s 
president took to Twitter last year. “We need plan to save water in agri- 
culture, prevent excessive tap water use, protect underground sources of 
water and prevent illegal drilling,” Hassan Rouhani tweeted in November. 
lran is far from alone. From the southwest United States to southern 
Spain and northern China, water shortages threaten many parts of 
the world. Nearly 800 million people lack access to safe drinking water and 
2.5 billion have no proper sanitation. 
The situation will probably get worse in coming decades. The world’s popula- 
tion is expected to swell from 7 billion today to more than 9 billion by 2050, even 
as climate change robs precipitation from many parched parts of the planet. If 
the world warms by just 2°C above the present level by the end of the century, 
which scientists believe is exceedingly likely, up to one-fifth of the global popula- 
tion could suffer severe shortages of fresh water. 

“Even without global environmental change, feeding 9 billion people by 2050 
will require an additional 2,000-3,000 cubic kilometres of fresh water in agri- 
culture — more than the total global use of water in irrigation,” says Johan 
Rockstr6ém, a specialist on water resources at Stockholm University and director 
of the Stockholm Resilience Centre. “This equates to nothing less than a new 
agricultural revolution. Novel approaches, such as water-harvesting practices, 
are absolutely critical in the future.” 

Most countries are seeking to expand access by tapping the underground aqui- 
fers that already supply the bulk of the fresh water for the global population. At the 
same time, some are experimenting with recycling waste water for agriculture and 
other uses. But many nations hope to tap unconventional sources — ranging from 
fog to the ocean — to quench their thirst. Some approaches involve billion-dollar 
deals; others are local efforts that require little in the way of costly technology. 
Here Nature looks at five ways to produce fresh water from unusual sources. 
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DESALINATION AT A COST 


ike all Mediterranean countries, Israel 
L receives most of its precipitation during the 

winter months. But last winter, almost no rain 
fell. In the past, such a drought would have caused 
severe problems for Israel's 8.2 million people. But 
thanks to the seawater desalination plants that 
Israel has built over the past decade, the country’s 
taps did not run dry. 

Israel's four large ‘reverse osmosis’ plants rank 
among the biggest and most efficient desalina- 
tion facilities in the world. By next year, they are 
expected to provide more than 500 million cubic 
metres of fresh water per year — about half of 
Israel’s needs. In 2012, IDE Technologies in 
Kadima, the company behind three of the existing 
Israeli plants, signed a deal to design a US$1-billion 
desalination facility near Carlsbad, California. 
When completed by 2016, it will supply fresh water 
to about one-tenth of the 3.2 million people living 
in San Diego county. 

A rapidly growing global industry, desalination 
has become in the past 20 years an essential source 
of fresh water for the Middle East, Australia, the 
United States, South Africa, Spain and, increas- 
ingly, India and China. In 2012, the total amount 
of installed desalination capacity exceeded 80 mil- 
lion cubic metres per day, enough to supply some 
200 million people. 

“With nearly half of the global population living 
within 100 kilometres of the ocean coast, you just 
cant avoid desalination,” says Gary Amy, direc- 
tor of the Water Desalination and Reuse Center 
at the King Abdullah University of Science and 
Technology (KAUST) in Thuwal, Saudi Arabia. 
“Desalination is here to stay and it will inevitably 
become bigger” 

But by any method, desalination consumes 
much more energy than conventional water 
sources. It takes just over 3 kilowatt hours (kWh) 
of energy to produce 1 cubic metre of potable 
water at the most efficient commercial reverse 
osmosis desalination plants — where pre-filtered 
sea water is forced under pressure through a series 
of semi-permeable membranes. A process that 
evaporates ocean water in thermal plants requires 
about 10 kWh to produce the same amount of 
potable water. Some oil-rich countries do not 
mind the high price: Saudi Arabia's desalina- 
tion industry, for example, currently burns some 
300,000 barrels of oil per day. 

Engineers are trying to improve reverse-osmo- 
sis technology using components such as low- 
energy pumps and advanced membranes. Some 
are experimenting with membranes made of 
graphene to replace the polymers currently used. 
And efforts are under way globally to shift from 
fossil fuels to renewable energies in the desalina- 
tion process. 

Even with those advances, desalination will 
remain costly, says Maria Kennedy, a water-treat- 
ment specialist at the United Nations’ Institute 
for Water Education in Delft, the Netherlands. 
“Nobody decides to do desalination unless they’re 
out of other options.” 


RIVERBANK FILTRATION 


very July and August, millions of Hindu pilgrims flock to the holy 
F city of Haridwar in India, to visit its temples and fetch water from 

the Ganges river. The aquifers that supply fresh water to the city 
cannot keep up with the annual influx of people, so another source is 
needed. The banks of the Ganges offer a solution. 

Germans along the Rhine have been using riverbanks to filter water 
since the 1870s. The method is straightforward: when wells are dug 
next to a river in regions with suitable geology, the river water filters 
through sand and gravel that strips out most of the chemical and 
biological pollutants, and so emerges relatively clean. 

“The treated water may not always meet the water-quality require- 
ments,” says Saroj Sharma, an environmental engineer at the UN’s 


a result has experienced repeated famines. But the villagers of 
Koraro no longer face water shortages, thanks to an imported 
ancient technology. 

Upmanu Lall, director of Columbia University’s water centre in New 
York City, brought the method to Koraro as part of the university's Mil- 
lennium Villages Project, which seeks to fight poverty and hunger in 
Africa through community-led efforts. While searching for a way to 
supply the community with water, Lall sought inspiration from water- 
works known as qanats, invented by Persian engineers more than 2,000 
years ago. These elaborate tunnels carry groundwater from high eleva- 
tions down to dry valleys and plains; some ancient systems are still in 
use in Iran and parts of the Arabian Peninsula. In 2009, with $250,000 
funding from the Ceil and Michael E. Pulitzer Foundation, Lall’s engi- 
neering students began to design a modern version of a qanat in Koraro. 

The village and surrounding fields are on a sandy slope, just a few 
kilometres away from the steep cliffs of a mountain. The region receives 
scant rainfall, except in July and August, when flash floods badly erode 
soil. In the past, villagers have stored rainwater in tanks, but much of that 
water evaporated quickly and the rest often became polluted. 

To get around these problems, the Columbia students, aided by 
Ethiopian engineers and local villagers, designed a system of small 


T he Tigray region of northern Ethiopia is notoriously dry, and as 


water institute. But when the river is relatively Hindu pilgrims 
clean and the geological conditions are favour- _ gather to bathe in the 
able, as in Haridwar, it may need onlya minor Ganges. 


amount of disinfection, says Sharma. 

India will have to increase its use of natural water-treatment systems. 
Groundwater currently provides 85% of the country’s domestic water, 
but supplies are rapidly declining: in 20 years, about 60% of all of 
India’s aquifers will be critically degraded, according to the World Bank. 

Researchers are now looking to improve the efficiency of technolo- 
gies for natural water filtration and reuse in India as part of the Saph 
Pani project, a $6.5-million collaboration at nine sites in the coun- 
try, funded by the European Union. The studies range from riverbank 
filtration in Haridwar to wastewater treatment in artificial wetlands in 
Hyderabad. 


rock dams at the top of the mountain to control surface run-off and 
allow the rainwater to seep into the subsurface. 

The water then flows down through the mountain into a trench 
measuring 3 metres wide by 3 metres deep, which stretches from the 
foot of the mountain down the slope to the village 4 kilometres away. 
The system, which can hold 36,000 cubic metres of water, has been 
working for three years. The trench recharges the groundwater around 
Koraro, thus supplying villagers with water for drinking and agriculture. 
The water has enabled villagers to add an extra planting season, and it 
supplements irrigation during breaks in the rainy season. 

“Just like the master-builders of ancient Persian qanats, we have 
created an aquifer where actually there wasn't one, says Lall. “And, 
filtered by the sand, the water we produce is of pure drinking quality.” 

“Water scarcity is often caused by sporadic rainfall rather than actual 
lack of water,’ says Alberto Montanari, a hydrologist at the University 
of Bologna in Italy. “The challenge then is to devise sustainable solu- 
tions for storing water to make a reserve for the dry season. The Koraro 
project is an excellent example how this can be done,” 

As word spreads about the success of the scheme, other communities 
in Tigray are planning to adopt similar techniques. The method, says 
Lall, could be applied in many locations with appropriate topography 
and hydrology, including most of Africa's semi-arid highlands. And 
Lall is already looking beyond Africa: he is in talks with the state of 
Jharkhand in northeast India to develop a qanat there. 
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FEATURE 
GREENING THE DESERT 


griculture uses more than two-thirds of 
Ais: fresh water, so the idea of a farm- 

ing practice that produces more water and 
energy than it consumes seems too good to be 
true. But in the desert of Qatar, scientists are 
showing that salt water and sunlight can yield 
food and clean water in a self-sustaining cycle. 

The Sahara Forest Project (SFP), a Norwe- 
gian company launched in 2009 and supported 
by the Oslo-based fertilizer company Yara and 
the Qatar Fertilizer Company of Mesaieed, 
operates an $8.5-million pilot facility outside 
Doha. Last year, the 700-square-metre green- 
house produced a crop of vegetables compa- 
rable to that of commercial greenhouses in 
Europe, according to SFP. 

Greenhouses normally trap heat, but the 
reverse is required in hot places such as Qatar. 
At the SFP facility, sea water does the trick. The 
water, piped from the ocean just 100 metres 
away, trickles over a lattice at the windward 
side of the greenhouse. As the water evapo- 
rates, it humidifies the air entering the green- 
house and cools it by some 10°C, creating an 
indoor climate suitable for growing vegetables 
such as cucumbers and tomatoes. Other crops, 
such as barley, salad rocket and useful desert 
plants, grow between hedges downwind of the 
greenhouse. 

When the desert cools at night, water con- 
denses on surfaces inside the greenhouse and 
is collected for irrigation and drinking. A 
desalination facility at the site produces further 
fresh water. And the electricity needed to run 
the entire installation comes from solar power. 

Joakim Hauge, chief executive of the SFP in 
Oslo, believes that the concept can be scaled 
up to create green oases in desert climates 
that are otherwise hostile to farming. “With 


Agriculture uses more than 
two-thirds of Earth’s fresh 
water. 


60 hectares of greenhouse production we could 
match the yearly import of cucumbers, toma- 
toes, peppers and aubergines to Qatar,’ he says. 

The company is working with the govern- 
ment of Jordan to set up a 20-hectare pilot facil- 
ity, including acommercial greenhouse unit and 
a research and innovation centre, in Aqaba. A 
larger commercial facility, says Hauge, would be 
able to produce excess electricity that could be 
exported to the grid. 

The concept might work in any dry and 
sunny location that is near sea level, and there- 
fore has low pumping costs. Even so, saltwater 
greenhouses remain an experiment for now, 
says Nina Fedoroff, director of the Center for 
Desert Agriculture at KAUST. “The concept 
is intriguing,” she says. “But it is still a rather 
pricey way of producing food that might not 
gain huge commercial traction” 


328 | NATURE | VOL 510 ]/19 JUNE 2014 


or as long as people can remember, women in the small moun- 

tain village of Tojquia, Guatemala, have had to trek down to the 

valley bottom during the dry winter months and haul fresh 

water back uphill to their families. But now they can get their 
water by wringing moisture from the fog that often envelops their community. 

One cubic metre of fog can contain up to 0.5 grams of liquid water, and harvesting it is 
relatively easy. A large vertical mesh panel can collect water droplets as the wind pushes 
clouds of moisture through its fibres. Tiny at first, the droplets coalesce and grow, then 
run into a gutter at the bottom and into a storage tank. 

At 3,300 metres above sea level, where winters are windy and dry but often foggy, 
Tojquia is an ideal site for this technique. With the help of researchers from the non-profit 
FogQuest project in Kamloops, Canada, the residents of Tojquia have installed 35 collectors 
since 2006. These produce an average of 6,300 litres of potable water per day — enough 
for about 30 families during the dry season — and considerably more in the wet season 
when rainwater, too, is collected in the storage tanks. 

Fog collection is catching on in seasonally dry regions that lack other sources of fresh 
water. The first simple mesh panels were built in the 1960s in the port town of Antofagasta 
in northern Chile. Today, 35 countries are using the technique, particularly along the Pacific 
coast of South and Central America, in the Atlas Mountains in Morocco and on the high 
plateaux of Eritrea and Nepal. 

Improvements could come from advanced mesh materials, such as the permeable fibres 
developed by scientists at the Massachusetts Institute of Technology in Cambridge; when 
tested in Chile, these collected fog at a rate five times that of conventional mesh. And in the 
Namib Desert in Namibia, three-dimensional meshes developed at the Institute of Textile 
Technology and Process Engineering in Denkendorf, Germany, have achieved up to three 
times higher water yields than normal meshes. 

Even with those kinds of gains, fog harvesting will not solve Chile’s — or any other coun- 
try’s — water shortages. But it can provide a simple and sustainable method of produc- 
ing fresh water in semi-arid regions that are short of other options, says Otto Klemm, a 
climatologist at the University of MUnster in Germany. 

“If the climatic conditions are right — and, importantly, if local people are trained to 
independently maintain the facilities,” he says, “it does have the potential of supplying 
rural communities with precious fresh water year-round.” = 


A fog collector in 
the hills above 
Lima, Peru. 


Quirin Schiermeier reports for Nature from Munich in Germany. 
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The inside track 


Members of the US National Academy of Sciences have long enjoyed a 
privileged path to publication in the body’s prominent house journal. 
Meet the scientists who use it most heavily. 


n April, the US National Academy of 

Sciences elected 105 new members 

to its ranks. Academy membership is 

one the most prestigious honours for a 

scientist, and it comes with a tangible 
perk: members can submit up to four papers 
per year to the body’s high-profile journal, the 
venerable Proceedings of the National Acad- 
emy of Sciences (PNAS), through the ‘contrib- 
uted’ publication track. This unusual process 
allows authors to choose who will review their 
paper and how to respond to those reviewers’ 
comments. 

For many academy members, this privileged 
path is central to the appeal of PNAS. But to 
some scientists, it gives the journal the appear- 
ance of an old boys’ club. “Sound anachronis- 
tic? It is.” wrote biochemist Steve Caplan of 
the University of Nebraska, Omaha, ina 2011 
blogpost that suggested the contributed track 
could be used as a “dumping ground” for some 


Who are the 41° 


power users? 

papers 
Just 13 members of the 
US National Academy of 
Sciences consistently 
published three or more 
papers per year in the 
‘contributed track’ at 
PNAS* during the past 
decade. ‘Other’ papers 
include direct 
submissions, reviewed in 
the normal way, and 
papers contributed or 
communicated by other 
members. 


® Nobel prizewinner 


Member of PNAS 
editorial board 


Former member 
of editorial board 
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papers. Editors at the journal have strived to 
dispel that perception. 

With PNAS currently celebrating its 
centenary, the news team at Nature decided to 
examine the contributed track, both to assess 
its scientific impact and to see which members 
use it most heavily and why. After analysing a 
decade’s worth of PNAS papers, we found that 
only a small number of scientists have used the 
track at close to the maximum allowable rate. 
The group includes some of the biggest names 
in science, and six are past or current members 
of the journal’s editorial board. These scien- 
tists say that the main motivator for using the 
contributed track is an intense frustration with 
the peer-review process at other high-profile 
journals, which they argue has become exces- 
sive and laborious. 

Our analysis also suggests that the efforts by 
PNAS to prevent abuse of the contributed track 
and to boost the quality of papers published by 


Jan-Ake 
Gustafsson 


*Proceedings of the National Academy of Sciences; +Total includes one paper submitted in 2003. 
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this route are bearing fruit. Although contrib- 
uted PNAS papers attract fewer citations than 
those handled through the journal’s standard 
review process, the gap has narrowed in recent 
years. “We have worked really hard at this,’ says 
Alan Fersht, a biophysicist at the University of 
Cambridge, UK, one of PNAS’s associate edi- 
tors and a heavy user of the contributed track. 


A PRIVILEGE TO PUBLISH 

An inside track to publication for academy 
members rests deep in PNAS’s DNA. The jour- 
nal was established in 1914 with the explicit 
goal of publishing members “more important 
contributions to research” in addition to “work 
that appears to a member to be of particular 
importance”. That remit led to the creation 
of two publishing tracks: contributed and 
‘communicated’ papers (manuscripts sent by 
non-members to colleagues in the academy, 
who would shepherd them through review). 


@ Solomon 
Snyder 


Richard 
Flavell 


These two tracks were the only ways to get a 
paper into PNAS until 1995, when biochemist 
Nicholas Cozzarelli of the University of Cali- 
fornia, Berkeley, took over as editor-in-chief 
and introduced ‘direct submissions, which 
are handled more like papers at other jour- 
nals. Direct submissions must pass an initial 
screen bya member of the editorial board, after 
which they are assigned to an independent edi- 
tor — either an academy member or a guest 
editor — who organizes peer review. 

Starting in 1972, the journal placed limits on 
the number of contributed papers that an acad- 
emy member could submit, and the current 
annual cap of four was imposed in 1996. Then 
in 2010, PNAS abolished the communicated 
track, which was already declining in popu- 
larity’. Today, more than three-quarters of the 
papers published in the journal are direct sub- 
missions. These papers are much less likely to 
be accepted than those contributed by academy 
members. Only 18% of direct submissions were 
published in 2013, whereas more than 98% of 
contributed papers were published, according to 
figures on the journal's website. (The one caveat 
is that PNAS has no data on how many papers 
intended for the contributed track receive nega- 
tive reviews and never get submitted.) 

Despite the impressive acceptance rate for 
contributed papers, the data collected show 
that many eligible scientists choose not to 
submit papers through this track. Of the more 
than 3,100 academy members who could have 
used the contributed track between 2004 and 
2013, fewer than 1,400 scientists did so. (This 
might in part reflect where researchers from 
different fields prefer to publish their work; 
the academy draws its members from all dis- 
ciplines, including researchers from fields such 
as astronomy and mathematics, who rarely 
send their papers to PNAS.) Most members 
who used the contributed track did so spar- 
ingly: the majority published on average fewer 
than one contributed paper per year. Only a 
small group consistently used the track at close 


Irving 
Weissman 


to the allowable maximum: from 2004 to 2013, 
13 scientists each contributed more than 30 of 
their own papers. This roster includes some of 
the best-known people in contemporary sci- 
ence (see “Who are the power users?’). 

Some of these researchers, such as Solomon 
Snyder, a neuroscientist at Johns Hopkins Uni- 
versity in Baltimore, Maryland, rarely or never 
publish in PNAS except through the contrib- 
uted track. But others, including immunolo- 
gist Tak Mak at the University of Toronto in 
Canada and cancer researcher Carlo Croce at 
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at PNAS. Cell declined to provide figures. How- 
ever, comparing across journals is difficult 
because each has different policies on when a 
revised manuscript is considered a ‘new’ sub- 
mission. 

Still, many of the contributed track’s power 
users believe that increased competition for 
space in high-profile journals has allowed edi- 
tors and reviewers to become more demand- 
ing. “Being able to publish four high-profile 
papers with much less grief than the usual 
high-prestige journal — that’s worth some- 


Editors have been dogged by the view 
that PNAS is a club for academy members 


Ohio State University in Columbus, also regu- 
larly send in direct submissions. 

Having control over the review process 
brings advantages. Those who work across 
disciplinary boundaries say that being able to 
choose your own reviewers is the best way to 
ensure that referees actually understand the 
material. “Chemists have no idea about glyco- 
biology,” says Chi- Huey Wong of the Scripps 
Research Institute in La Jolla, California, who 
studies the chemistry and biology of sugars. 

But for others, including Croce, who con- 
sistently hits his annual allocation of four 
contributed papers per year, the track’s appeal 
boils down to one word: speed. Several of the 
contributed track’s most regular users say that 
they have had papers held in limbo for up to 
two years at Nature, Science or Cell while the 
manuscripts went through multiple reviews 
and revisions. “In two years, you can be 
scooped over and over and over,’ says Croce. 

Science and Nature each provided figures for 
median time passed between submission and 
publication for recent papers, which suggest 
lag times greater than for contributed articles 


Alan 
Fersht 


Peter 
Schultz 
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thing,” says Snyder. Some of the power users, 
including Snyder and Mak, add that the con- 
tributed track benefits postdoctoral research- 
ers or students in their laboratories who are 
searching for jobs and need high-profile pub- 
lications more quickly than the review time at 
Nature or Science would allow. 

Complaints about nitpicking reviews at 
Nature and Science go hand-in-hand with the 
charge that the editors at these journals are in 
thrall to trendy areas of research. “Very often 
what seems to be fashionable is not very good 
science,” says Croce. 


SPECIAL ACCESS 

The problem for most scientists looking to 
advance their career, however, is that they do 
not have the option of turning to PNAS’s con- 
tributed track. No wonder, then, that succes- 
sive editors-in-chief have been dogged by the 
view that PNAS is a club for academy mem- 
bers. “We want to remove this perception,” 
says current editor-in-chief Inder Verma, a 
gene-therapy researcher at the Salk Institute 
for Biological Studies in La Jolla. 


Thomas 
Stidhof 


Ho-Kwang 
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The steady growth of direct submissions 
bears witness to efforts by Verma and his 
predecessors to make the journal attractive 
to scientists who are not academy members 
(see ‘A changing journal’). “When I was edi- 
tor, I was very concerned about the abuse of 
members’ privilege,” says Randy Schekman 
of the University of California, Berkeley, a 
former PNAS editor-in-chief, under whose 
watch communicated papers were abolished 
(see Nature http://doi.org/d22bqx; 2009). 
Academy members were consulted on that 
decision, and it was a popular one — probably 
because it freed members from having to deal 
with submission requests from colleagues. 

But it would be far more difficult to convince 
members to give up their own publishing 
privileges. Even the contributed 
track's critics accept that it is here 
to stay, at least for the foresee- 
able future. “I'd just do away with 
it,” says applied physicist David 
Weitz of Harvard University in 
Cambridge, Massachusetts. “But 
it’s something that many mem- 
bers of the academy have viewed 
as their prerogative.” Weitz, who 
sits on the PNAS editorial board, 
publishes some of his best work 
in the journal, but has a policy of 
never using the contributed track. 
“T don't want to have a special ‘in?’ 
he says. 

The contributed track’s most 
enthusiastic users argue that 
their papers get thoroughly 
reviewed. “The referees I choose 
are people I hardly know but who 
can give the best review of the 0 
papers — so I don't get egg on my 
face; says Fersht. “It’s not a free 
ride,’ agrees Mak, who adds that 
his haul of contributed PNAS papers should 
be viewed against his high productivity over- 
all. His laboratory published more than 300 
original research papers over the same decade. 
Many of the other power users head similarly 
productive labs. 

PNAS has also tried to limit conflicts of 
interest by barring members from picking 
recent collaborators to referee their papers. 
Current rules prohibit members from choos- 
ing any scientist they have worked with in 
the past four years. The journal’s editorial 
board can also step in to block contributed 
papers if it feels that members are abusing 
their privileges, a process that Schekman 
says took a considerable amount of time and 
effort during his tenure. Telling big-name 
scientists — some with egos to match — that 
their work isn’t up to snuff can be difficult. 
“We would challenge these papers, and peo- 
ple would take umbrage and personally attack 
me,’ says Schekman. “It was discouraging to 
have to deal with that, but I was unbowed.” 
Verma continues the fight, taking a wry view. 


Number of papers published 


1,000 
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“Every member of the academy is a legend in 
their own mind,’ he jokes. 

As well as providing oversight for the con- 
tributed track, the nearly 200-strong PNAS 
editorial board includes some of the track’s 
most enthusiastic users. Our analysis shows 
that almost half of those who contributed more 
than 30 papers over the past decade are current 
or former members of the board — including 
Fersht, Mak and Snyder. These scientists work 
hard for PNAS: none more so than Snyder, who 
has organized the review of hundreds of direct- 
submission papers over the past decade. 

Verma is adamant that there is no prefer- 
ential treatment for those who sit on the jour- 
nal’s editorial board. Still, he acknowledges 
that the perk of the contributed track helps to 


A changing journal 


The number of direct submissions to Proceedings of the National Academy of 
Sciences has been increasing steadily over the past decade. Communicated papers 
were phased out in 2010, but the contributed track has remained constant. 


@ Contributed papers 
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explain how the journal can operate without 
professional editors. Fersht agrees: “Members 
are willing to act as editors, and part of it is 
because they know they are able to publish 
their own papers.” Verma says that more than 
1,200 members of the academy responded to 
the call to edit one or more papers in 2013, and 
he argues that collective editing by leading sci- 
entists is the journal's main strength. 

But all of that does not quell criticism of 
the contributed track, and there is evidence 
that contributed papers have less impact than 
those reviewed in the usual way. In 2009, 
psychologist David Rand and evolutionary 
biologist Thomas Pfeiffer, then both at Har- 
vard University, looked at citations to papers 
published in PNAS between June 2004 and 
April 2005. Controlling for factors such as 

scientific discipline and 


> NATURE.COM time elapsed since pub- 
Formoreinformation lication, the pair found 
onhowthe analysis that contributed papers 
was done, see: were cited less often 
go.nature.com/gm7ode than direct submissions 
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and communicated papers’. (By the time 
Rand and Pfeiffer published their analysis, 
PNAS had already decided to abolish the com- 
municated track.) 

Although citations are not the only way to 
judge the impact of papers, they are the most 
readily available and widely researched meas- 
ure. We repeated and extended Rand and 
Pfeiffer’s analysis, considering papers pub- 
lished from 2004 to 2011. Overall, the con- 
clusion was the same: the difference between 
citation rates for directly submitted and con- 
tributed papers was not large — controlling 
for other factors such as discipline, contrib- 
uted papers garnered about 4.5% fewer cita- 
tions — but it was statistically significant. 
Nature’s analysis also suggests that the gap 
in citation rates between directly 
submitted and contributed papers 
has been narrowing, and this does 
not seem to be because more- 
recent papers have yet to acquire 
enough citations for the difference 
to show. 

Viewed in this light, the journal 
seems to be making progress with 
its efforts to eliminate the abuse 
of publishing privileges by acad- 
emy members. And Verma vows 
to keep up the pressure. He is now 
encouraging academy members 
to list the reviewers for contrib- 
uted papers, taking the lead by 
doing so for his own most recent 
contribution’. Such transparency, 
he hopes, will hold everyone to 
rigorous standards. 

Verma also wants to eliminate 
what some scientists see as a ves- 
tige of the old communicated 
track — an option to request a 
‘prearranged editor’ from the 
academy. One in five direct submissions pub- 
lished in 2013 used a prearranged editor, and 
the acceptance rate for these papers is higher 
than for other direct submissions. “More 
and more the playing field will be levelled? 
says Verma. 

As PNAS marches into its second century, 
debate about its idiosyncratic publishing mech- 
anisms is sure to continue. But for those who 
benefit from the journal's distinctive approach, 
PNAS’s quirks are inherent to its appeal. “The 
last thing we need, I think, is less diversity,” 
argues Nobel-prizewinning neuroscientist 
Thomas Siidhof of Stanford University in Cali- 
fornia. “Turning PNAS into a standard journal, 
in my view, would make it unnecessary.” m 


Peter Aldhous is a science journalist in San 
Francisco, California. 
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Italian stem-cell researcher Elena Cattaneo. 


Taking a stand against 
pseudoscience 


Elena Cattaneo and Gilberto Corbellini are among the academics working to protect 
patients from questionable stem-cell therapies. Here, they share their experiences 
and opinions of the long, hard fight for evidence to prevail. 


working long hours at the bench with 

like-minded colleagues, but some- 
times their duty lies elsewhere, even if it 
means missing grant deadlines and receiving 
threatening letters. When lax clinical stand- 
ards endangered Italy's health-care system 
and patients, we were among those who left 


. \ cientists get the most satisfaction from 


the comfort of our labs and offices to fight 
for evidence to prevail. 

Since its creation in 2009, the Stamina 
Foundation, a private organization in Italy, 
has been claiming that stem cells collected 
from human bone marrow can be trans- 
formed into neural cells by exposure to 
retinoic acid, an important molecule in 
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embryonic development. Stamina’s founder 
Davide Vannoni, who has not trained as 
a scientist or physician, holds that injec- 
tions with these cells can treat conditions 
as diverse as Parkinson’s disease, muscular 
dystrophy and spinal muscular atrophy. He 
has not published in the peer-reviewed lit- 
erature. (PubMed searches for Vannoni > 
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STAMINA SAGA 


The ups and downs of Italy’s struggle 
with stem-cell-therapy claims. 


2011 


The Stamina Foundation, founded by Davide 
Vannoni (pictured) sets up operations ina 
public hospital in Brescia, Italy. 


MAY 2012 


The Italian Medicines Agency shuts down 
Stamina operations because of safety concerns. 


MARCH 2013 


Italian health minister allows Stamina 
treatments to continue; 13 leading Italian 
stem-cell scientists write a letter in protest. 


MAY 2013 


Italian government agrees to sponsor 
clinical trial of Stamina’s procedure. 


JULY 2013 


Data in Stamina patent application 
found to be flawed. 


AUGUST 2013 


Elena Cattaneo appointed as lifetime senator in 
Italian Senate; Stamina investigations continue. 


OCTOBER 2013 


Trial plans halted after scientific committee 
identifies problems with Stamina’s protocol. 


DECEMBER 2013 


Decision made to form new committee to 
re-investigate Stamina protocol. 


JANUARY 2014 


Paolo Bianco, Cattaneo and Michele De Luca 
win public-service award from the International 
Society for Stem Cell Research. 


APRIL 2014 


Public prosecutor accuses Stamina founder 
of attempted fraud, and him and others 
of criminal conspiracy. 


+ 


MAY 2014 


European court rules that ‘compassionate 
therapy’ requires scientific evidence. 
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> with the key words ‘stem cell’ or ‘neuron’ 
return nothing.) He has moved his labora- 
tory around and outside Italy, stating a desire 
to work where regulations are less strict. 

Multiple scientists and government 
officials have found that Stamina’s cell- 
preparation protocols are flawed and that 
evidence that the treatments work is wanting. 
Nonetheless, Italy’s national health services 
paid for some of these procedures, and the 
Italian parliament even agreed to sponsor a 
€3-million (US$3.9-million) clinical trial. 

For most of the past two years, we and 
others (especially stem-cell specialists 
Paolo Bianco and Michele De Luca) have 
spoken out against these treatments. We 
have had to miss grant deadlines and profes- 
sional meetings to make our case. We have 
learned to apply our investigational abili- 
ties outside our disciplines, and have come 
to appreciate the skills involved in helping 
non-scientists to grasp the value of evidence, 
rigour and risk assessment. 

Our most recent victory came on 28 May, 
with the release of a ruling from the Euro- 
pean Court of Human Rights that patients 
have no right to receive therapies for which 
there is no scientific evidence. But we are not 
ready to relax. Earlier this month, Marino 
Andolina, the Stamina Foundation’s vice- 
president, was appointed acting commis- 
sioner of the public hospital in Brescia, in 
northern Italy, where the foundation still 
operates; a court gave him the go-ahead to 
give a child the ‘Stamina treatment. 

Desperate patients will always be vulnera- 
ble to exploitation. We hope that sharing our 
experience — and we learned some lessons 
the hard way — will help other investigators 
to join the fight against predatory pseudo- 
science. 


INTO THE FRAY 

We first became aware of Stamina’s claims 
in August 2012. Three months before, 
inspectors from the Italian Medicines 
Agency had shut down Stamina’s opera- 
tions at the hospital in Brescia, deem- 
ing its cell-preparation methods unsafe. 
Patient groups responded with lawsuits, 
demanding that the ‘Stamina method’ be 
made available for anyone with a terminal 
illness and for its costs to be covered by 
Italy’s public health services. 

In August 2012, one Italian court ruled 
that a child with spinal muscular atrophy 
could receive the treatment. Since then, 
the majority of the 500 courts that patients 
turned to decided in favour of the treat- 
ment and ordered its administration in the 
Brescia hospital. 

In winter 2012, we and others began 
alerting patients, politicians and the press 
— writing articles and giving dozens 
of interviews every week — to the view 
that the method lacked both regulatory 
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precedent and scientific rationale and did 
not qualify for compassionate use. 

Together with De Luca and Bianco, we 
began scrutinizing websites and Facebook 
pages into the small hours. We found that 
although Stamina presented itself as a private 
charitable organization, its address was that 
of acommercial company, Medestea, which 
had been fined for misleading advertising 
for dietary supplements. We began to collect 
evidence that Vannoni was trying to lobby 
government officials and members of parlia- 
ment to have his operations exempted from 
regulatory oversight and to have national 
health plans cover untested protocols. We 
found that Stamina’s patent applications had 
been rejected because the US patent office 
found they lacked specificity, stating in part 
that it was unlikely that collected cells could 
be induced to form desired types under the 
conditions described. But no one — not the 
journalists, public-health authorities or hos- 
pital physicians — had bothered to dig. We 
began talking daily with officers in the health 
unit of the Italian police. 

By early 2013, those of us objecting to 
Stamina were being vilified by Vannoni and 
by some media outlets as keeping children 
from life-saving treatments. The evidence, 
which a small group of us had spent months 
collecting and distributing, was largely 
ignored. We knew that there can be no com- 
passion without safety and efficacy, and that 
we needed to stay vocal, lucid and rational. 
Most of all, we had to avoid succumbing to 
the feeling that we had done all that we could 
be expected to do. 


ON TRIAL 

We prepared 40-page dossiers for every 
politician whom we could reach, and the 
legislature held hearings for Stamina advo- 
cates and challengers to make their cases. 
Vannoni was unable even to remember 
the names of the clinicians with whom he 
worked. 

In May 2013, the government promised 
to pay for a $3.9-million clinical trial, even 
though Vannoni had not presented evidence 
from animal or cell-based studies, or even 
established cell-preparation protocols that 
guard against contamination. Here was a 
dilemma: the trial would be an appalling 
waste of meagre public money, yet some 
of us thought that it would be better than 
unknown cells being injected into children. 
At least for a rigorous trial, cells would be 
prepared by an authorized laboratory under 
strict quality controls and the protocol 
could be scrutinized. 

In August 2013, the Italian President 
Giorgio Napolitano appointed one of us 
(E.C.) and the Nobel-prizewinning physi- 
cist Carlo Rubbia as Senators for Life in the 
upper house of the legislature — positions 
that are usually reserved for politicians. 
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Patient advocates campaigning in November 2013 for access to the Stamina method. 


The appointments, part of an effort to 
strengthen science in Italy, gave our band 
of researchers investigating Stamina greater 
access to politicians. 

As part of the requirements for the clinical 
trial, Vannoni revealed his putative method 
for preparing cells. A scientific committee 
appointed by Italy’s health minister found, 
among other shortcomings, that the method 
included flawed techniques to assess cells’ 
identity and lacked basic screens for path- 
ogens. An earlier analysis of frozen cells 
collected from Stamina found only blood 
cells, and no neurons. Plans to begin the trial 
were cancelled in October 2013. 

In December 2013, another court ruled 
that any committee members who had previ- 
ously spoken publicly against Stamina were 
biased, and called for the creation of another 
committee to re-examine the protocols. That 
same month, the health ministry said that 
the condition of three dozen patients treated 
with Stamina’ protocols had not improved. 
(Vannoni maintains that patients’ conditions 
did improve.) 

Last month, the International Journal of 
Stem Cells published a single-author paper 
by Andolina, describing a boy with a severe 
neurodegenerative disease who had been 
injected with cells from his father (M. Ando- 
lina Int. J. Stem Cells 7, 30-32; 2014). The 
three-page paper contains no figures, no 
detailed methods and no supplementary 
materials, yet states that the boy’s “move- 
ments [and] relationship with the parents’, 
improved. Even more bafflingly, the author 
declares that he has “no conflicting financial 
interest”. Last week, some scientists wrote to 
the journal about these concerns. 


Meanwhile, Stamina’s case continues 
to unravel. Italian police are looking into 
accusations against the foundation from 
patients’ relatives. In April, after a four-year 
investigation, a public prosecutor accused 
Vannoni of attempting to fraudulently 
obtain public money, and along with some 
physicians and civil servants, also of criminal 
conspiracy. A judge will determine whether 
the cases will go to trial. Vannoni maintains 
that he is innocent of this and other charges. 


FIGHTING FOR RIGHT 
Our crusade has come at a high personal 
cost. The past 18 months have been a roller 
coaster of hope, disappointment, triumph 
and outrage. We have spent countless hours 
talking to each other and to politicians on the 
phone, in person and on video conferences. 
We prepared and shared at least six dossiers 
and dozens of slides. We have given inter- 
views to newspapers and written commen- 
taries almost weekly. We exchanged letters 
and comments with patient organizations; 
we established relationships with doctors at 
the public hospital that had housed Stamina, 
which has now distanced itself from Vannoni. 
Every morning, we reviewed the battlefield 
in detail. We had to be prepared to change 
plans at the last minute when Stamina won a 
media, political or regulatory skirmish. Since 
June 2013, both of us, along with De Luca 
and Bianco, have been repeatedly asked by 
students’ associations, university professors, 
science-festival organizers, patient asso- 
ciations and other groups to give lectures 
on the Stamina case. We never turn down 
these requests. Those of us who run research 
groups (E.C., De Luca and Bianco) estimate 
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that we have each sacrificed 60-80 weeks of 
lab time so far and have delayed submitting 
papers. We often catch up with our students 
and lab members at night and by e-mail. 

We learned to avoid appearing on televi- 
sion shows on which cool reason is drowned 
out by strong emotional messages. Over sev- 
eral months, some of us received threatening 
letters and insults from people who felt that 
we lacked compassion for dying patients. 
Several of these letters were serious enough 
that we forwarded them to police. Our insti- 
tutions filed complaints against unknown 
people hanging around our labs. Our uni- 
versities were the target of e-mail and other 
cyberattacks. 

Gathering support from the international 
community has proved invaluable. It under- 
lined that we were not just local trouble- 
makers, but had worldwide backing. An 
advocacy award given to E.C., Bianco and 
De Luca by the International Society for 
Stem Cell Research boosted our credibility 
in Italy, as did statements from Nobel laure- 
ate and stem-cell pioneer Shinya Yamanaka 
and publications in the scientific literature. 

At home, finding the right allies and getting 
the best from them was key. We need to be 
able to talk with everyone, regardless of their 
scientific knowledge — from taxi drivers to 
lawyers. Some people welcome the documen- 
tation and persistence that comes naturally to 
a scientist. Others want to debate values and 
opinions; it is important to respect and engage 
with this, steadily explaining the difference 
between beliefs and facts. 

Nurturing relationships with fellow scien- 
tists involved in the battle was also key. We 

had to learn to be gen- 


“Wehaveeach  erous and to remem- 
sacrificed ber that we shared a 
60-80 weeks single goal. In public 
of lab time advocacy, the prima 
so far. ¥ donna attitude is not 


helpful. Maintaining 
valid and effective political and communi- 
cative actions requires a united front. 

But it has all been worth it. Now, thanks 
to the European Court ruling and a Senate 
investigation into the case that launched three 
months ago, we are hopeful that these dubious 
treatments will soon be banished from Italy; 
they were displaced from Switzerland in 2011 
and from Cape Verde earlier this year. We rec- 
ommend that all scientists stand up for the 
scientific method. Science depends on public 
institutions and is done in the public interest 
— we have a duty to defend both. m 


Elena Cattaneo is at the Department 

of Biosciences and director of the Centre 

for Stem Cell Research at the University 

of Milan, Italy. Gilberto Corbellini is a 
historian of medicine and a bioethicist at the 
University of Rome La Sapienza, Italy. 
e-mail: elena.cattaneo@unimi.it 
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COMMENT 


Sell help not hope 


Stem cells are being used as a wedge in calls to allow unproven medical 
interventions onto the market, warn Paolo Bianco and Douglas Sipp. 


odern medicine depends on 
products that must pass rigorous 
tests for safety and efficacy before 


being marketed to patients. Such require- 
ments are in place for drugs and other 
medical products across the world. Over 
the past decade, however, some have called 
to weaken or even undo this key protection. 

Think tanks in the United States are using 
stem cells to promote broader deregulation; 
these moves are influencing policy in other 
countries. Some argue that stem-cell prod- 
ucts and procedures should not be governed 
by drug regulatory agencies at all; others 
want to bypass requirements that treatments 
must be shown to work before they are sold. 

‘Free-to-choose’ reasoning pits the scien- 
tific method against unrestrained market 
forces. But there is little correlation between 
business success and efficacy in poorly 
regulated markets; the billions of dollars in 
revenue from nutritional supplements and 
homeopathy bear testament to that. 

A loosening of the regulatory strictures 
would enable companies and practitioners 
to generate revenue from untested products 
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and procedures. Patients would, in effect, 
pay to serve as research subjects. Worse, 
with no requirement to demonstrate effi- 
cacy, there would be less need for research, 
so new treatments might not be discovered 
and developed. What is needed are better 
business models for bringing innovative 
medical technologies to market, not lower 
standards. 


BUYER BEWARE 

Three key documents give a sense of what 
is at stake. Under the Free to Choose Medi- 
cine campaign put forward in 2010 by the 
Heartland Institute in Chicago, Illinois, US 
companies would be able to sell drugs after 
small clinical trials that are insufficient to 
establish either safety or efficacy. 

The campaign’s language is echoed in 
the bill for the Compassionate Freedom of 
Choice Act put forward to the US Congress 
in April. This would exempt from liability 
those who sell investigational products 
to people who are terminally ill and pro- 
hibit the US Food and Drug Administra- 
tion (FDA) from requiring companies 
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even to report clinical information. 

The Right to Try legislation proposed by 
the Goldwater Institute in Phoenix, Arizona, 
goes even further. Written as a template for 
state legislatures, the document calls for 
criminal penalties for public employees 
who seek to enforce regulations on physi- 
cians who are selling investigational drugs, 
biological products and devices to people 
who are terminally ill. 

Although the above proposals target medi- 
cal products and practice in general, in the 
past three years, stem cells and regenerative 
medicine have become the rallying cry of 
the free-to-choose lobby. In an opinion piece 
published’ in 2012 in The Wall Street Jour- 
nal, former FDA commissioner Andrew von 
Eschenbach, who now heads Project FDA at 
the Manhattan Institute for Policy Research in 
New York, wrote that regenerative-medicine 
products and other ‘promising’ therapies 
should be allowed onto the market after 
proof-of-concept and safety testing, and be 
evaluated only afterwards for efficacy. In the 
same newspaper, another former senior FDA 
official, Scott Gottlieb, now at the American 


LAGUNA DESIGN/SPL 


Enterprise Institute for Public Policy Research 
in Washington DC, also called’ for loosening 
approval standards specifically for stem-cell 
products, warning readers that “the FDA 
wants to regulate your cells”. 

The fiercest arguments are over autologous 
cells — those collected from and delivered 
back to the same patient. Courts, scientists, 
clinicians and ethicists have argued that 
stem-cell-based products should be regu- 
lated as drugs if they are processed or if their 
intended therapeutic behaviour differs from 
that in their original location. 

Proponents of deregulation counter that 
autologous cell products should be treated 
as part of medical practice and thus not sub- 
jected to marketing approval. For example, 
in legal battles against the FDA, Regenerative 
Sciences of Broomfield, Colorado, argued 
that the agency did not have oversight over 
its human-cell-based products. Two con- 
servative medical groups filed arguments 
in the company’s defence, and the Manhat- 
tan Institute published a legal analysis sup- 
porting the company’s position. US courts 
ultimately upheld the FDAs authority, sus- 
taining its ability to regulate products based 
on human cells and tissues. 

In a separate case in 2012, the FDA issued 
a warning letter to Celltex Therapeutics in 
Houston, Texas, after the state had moved 
to allow physicians to market investiga- 
tional stem-cell products. The company 
subsequently shifted its clinical operations 
to Mexico; it posts the Manhattan Institute's 
position paper on its website. 


RELAXED MARKETS 

Several countries in Asia, Latin America and 
the Caribbean have already punted stem cells 
straight onto the market, often in concert with 
state-backed initiatives to promote medical 
tourism. Australia has exempted autologous 
cells from the purview of its drug regulatory 
agency, the Therapeutic Goods Administra- 
tion, unleashing offers of unproven treat- 
ments from at least a dozen clinics. 

In the controversy surrounding the Stam- 
ina Foundation in Italy’, which offers an 
unproven stem-cell treatment for a range of 
conditions, US advocates of free-to-choose- 
medicine last year pressed the Italian gov- 
ernment to allow entities to market stem 
cells for diseases such as ischaemic heart dis- 
ease and multiple sclerosis without requir- 
ing any proof of efficacy, and only a small 
phase I clinical trial to evaluate safety. This 
prompted alarm and counter-arguments 
from scientists (see page 333). Earlier this 
year, a nearly identical deregulatory proposal 
was published‘ as a call for stem-cell prod- 
ucts to be placed on the market first, and 
tested for clinical effectiveness later. 

In November 2013, Japan enacted a 
regulatory regime that allows companies 
to market ‘regenerative-medicine’ products 


that have shown nominal safety and inklings 
of efficacy in phase I trials for up to seven 
years without presenting further evidence of 
efficacy. How efficacy would be determined 
after this period is unclear. Foreign stem- 
cell companies are already lining up to enter 
Japan's lucrative market”. 


BACKWARD STEP 

Even the idea of putting products up for sale 
and into consumers’ bodies on the basis of 
phase I data is disturbing. Early-stage clinical 
trials reveal only whether a product is safe 
enough for continued testing, not for wide- 
spread use. Some 80% of products that make 
it through phase I clinical trials fail in later 
studies — about half of those proving to be 
insufficiently effective and one-fifth insuf- 
ficiently safe®. 

When test subjects are paying for the prod- 
uct under investigation, establishing efficacy 
is hard: controls, randomization, masking 
and other hallmarks of clinical research break 
down’. Many stem-cell clinics offer their pro- 
cedures for disparate conditions, further com- 
plicating post-market studies. 

Under the guise of ‘patient-funded clini- 
cal trials, clinics in the United States and 
Mexico persuade people who are seriously 
ill to pay tens of thousands of dollars for pro- 
cedures*. Because such patients have been 
told that a product is experimental, they have 
little recourse when hoped-for cures fail to 
materialize. Companies can thus profit from 
selling hope. With their products already on 
the market, they have little reason to conduct 
rigorous, conclusive research. 

Advocates of deregulation suggest that 
databases of patient information could pro- 
vide the data needed to tease out efficacy. 
Aside from the fact that such databases are 
not in place, and their construction would 
require massive outlays of public money 
(something conservative groups ordinarily 
bemoan), there are also no means of ensur- 
ing compliance. More than 360 registered 
studies at ClinicalTrials.gov are listed as 
using mesenchymal stem cells as an inter- 
vention. None lists results. 

In short, proposals for deregulation come 
shrouded in appealing messages that shift 
adroitly in response to critiques: freedom 
of choice, giving hope to dying patients, 
fighting bureaucratic obstructionism and, 
of course, innovating medicine. But it is a 
business model that removes the incentives 
to make drugs and treatments ever better. 
It offloads financial risk from investors and 
companies to patients, and requires the very 
ill to pay for interventions that are unlikely 
to work. 


ABETTER WAY 

The pressure to deregulate comes from 
the failure of current business models as 
engines of innovation. In the United States 


© 2014 Macmillan Publishers Limited. All rights reserved 


and Europe, regulatory structures actively 
support therapeutic development, even for 
rare or orphan diseases. The gene-therapy 
product Glybera (alipogene tiparvovec), 
approved by the European Medicines 
Agency in 2012 for a condition that can 
cause life-threatening pancreatitis, is an 
example of establishing efficacy for a com- 
plex, innovative product with a small, but 
rigorous trial’. 

However, many companies are not 
equipped, scientifically or technologically, 
to develop therapies from promising biologi- 
cal advances in the five or so years that their 

investors demand. 


“Itis a Much longer and 
business deeper commitment 
model that is needed to bring 
removes the stem-cell and other 
incentives complex therapies to 
to make market. 

drugs and Regulatory agen- 
treatments cies face a growing 


challenge in gauging 
the merits of stem-cell 
therapies. Meanwhile, governments need 
better mechanisms to identify and support 
radical innovation. Untapped tools include 
public investment banks and equity shares”. 

For stem-cell research to achieve its thera- 
peutic potential, science, medicine, economy 
and policy all must work together. If, as has 
been widely asserted, stem cells represent the 
future of medicine, then we need to ensure 
that that future is one in which patients can 
reasonably expect treatments to be both safe 
and effective. m 


ever better.” 


Paolo Bianco is a stem-cell biologist and 
professor of pathology at the University of 
Rome La Sapienza, Italy, and is an editor of 
Stem Cell Research. Douglas Sipp is head 
of the Office for Research Communication 
at the RIKEN Center for Developmental 
Biology in Kobe, Japan. 
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The chloroplast (pale green oval) 
originated when a cyanobacterium 
formed a symbiotic partnership with 
an ancestral plant cell. 


EVOLUTION 


The complexity chronicles 


Nancy Moran enjoys a treatise on symbiosis — the intimate association of species that 


transformed life and Earth. 


Archibald melds two epic stories. One 

is the 3.8-billion-year tale of the funda- 
mental biochemical inventions that underlie 
life on Earth, and how they were swapped 
and merged to produce complex life forms. 
The second follows the scientists who first 
mapped the domains of life and finally 
proved the central evolutionary role of sym- 
biosis — the intimate associations between 
two or more distinct species. 

Cells originated, became complex and 
expanded their capabilities: events that, as 
Archibald puts it, “led to a transformation 
of ocean, land, and atmosphere”. He relates 
the scientific struggles behind the discover- 
ies of these events with an appreciation of 
the strategies used. The microbiologist Carl 
Woese, for example, catalogued ribosomal 
RNA fragments harvested from large vol- 
umes of microbial cultures to transform 
our understanding of the tree of life. He 
delineated the ancient lineages that were 
later recognized as the fundamental players 
in the symbiotic formation of complex cells. 

Archibald also offers glimpses into the 


1E One Plus One Equals One, John 
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personalities of these pioneers. I loved the 
story of how, in 1978, Woese was so eager to 
see how the first determined ribosomal RNA 
gene sequence (from Escherichia coli) fitted 
with his own data that he could not wait for 
his issue of FEBS Letters to arrive in Illinois 
by post. Instead, he called biochemist Ford 
Doolittle in Halifax, Canada — who already 
had a copy — and got him to read out the 
1,542-letter sequence 
over the telephone. 
The origins of mito- 
chondria and chloro- 
plasts from bacterial 
ancestors are arguably 
the two biggest inno- 
vations in the history 
of life. In eukaryotes, 
organisms in which 
the genomic DNA of 
each cell is packaged 
ina membrane-bound 
nucleus, mitochondria 


One Plus One 
Equals One: 
Symbiosis and 
the Evolution of 


Complex Life 
serve as energy facto- JOHN ARCHIBALD 
ries; plants and algae — Oxford University 
also have chloroplasts Press: 2014. 
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that allow the harvest and storage of energy 
from sunlight. As their remnant genomes 
show, mitochondria and chloroplasts 
each arose from a specific bacterial group 
(a-proteobacteria and cyanobacteria, 
respectively). And each arose from a single 
endosymbiotic event in which the bacterium 
was engulfed by an ancestral cell that ‘chose’ 
to coexist with it, rather than digest it. Those 
two choices made all the difference. 

From 1905, when Constantin Mere- 
schkowsky first postulated that higher 
plants depend on “little green slaves” (chlo- 
roplasts), until the 1980s, the endosymbiotic 
theory provoked controversy. The evolution- 
ary biologist Lynn Margulis was a proponent 
from the late 1960s onwards, along with 
botanist Peter Raven and microbiologist 
Jostein Goksoyr; botanist Arthur Cronquist 
was among the detractors. Proof came with 
the molecular era. As Archibald describes, a 
constellation of biologists, biochemists and 
bioinformaticians — prominently Michael 
Gray, Doolittle and Margaret Dayhoff — 
exploited molecular technologies as they 
became available. 


GEORGE CHAPMAN/VISUALS UNLIMITED/CORBIS 


The consequences of symbiosis are 
ubiquitous and ongoing. Symbiotic cells 
have themselves been engulfed as symbionts 
of hosts, from algae to insects. Archibald 
gives many examples, including the citrus 
mealybug Planococcus citri, which con- 
tains one bacterial symbiont nested within 
another. And sequencing data are revealing 
many ghosts of symbioses past, in the form 
of genes transferred between interacting 
genomes. Many nuclear genes in plants were 
transferred from the chloroplast ancestor, for 
example. 

Mysteries remain. A central one is the 
origin of eukaryotic cells. Their distinctive 
nuclei, as well as other attributes such as a 
cytoskeleton and endomembrane system, 
clearly show that these cells arose only once. 
The few eukaryotes that lack recognizable 
mitochondria, such as the protozoan para- 
site Giardia lamblia, descend from ancestors 
that had them, as evidenced by sprinklings 
of mitochondrion-derived genes in their 
nuclear genomes. If any proto-eukaryote 
had a nucleus but no mitochondrion, it left 
either no descendants, or descendants so few 
or secluded that they remain undiscovered. 

Why would the ancestral mitochondrion 
have been retained? The ‘ox-tox’ hypothesis 
posits that the mitochondrion provided an 
‘oxygen antidote for the anaerobic host cell, 
which would have struggled to thrive in con- 
ditions of rising atmospheric oxygen. This 
seems paradoxical, because modern mito- 
chondria generate oxygen by-products that 
would have been toxic to the host. 

An alternative idea is the hydrogen 
hypothesis. This posits that the eukaryotic 
cell evolved from a separate-but-equal part- 
nership between a hydrogen-producing 
a-proteobacterium and a methane-producing 
archaean. In this idea, the nuclear envelope 
arose after the symbiosis. Archibald weighs 
up the arguments, but the jury is still out. 

Just as distinct organismal lineages swap 
and combine biochemical inventions, gen- 
erating ecological breakthroughs, scientific 
disciplines exchange technology and ideas, 
instigating unexpected leaps forward. One 
could venture that molecular biology did for 
evolutionary biology what chloroplasts did 
for the eukaryotic ancestor of plants. In both 
cases, it is hard to say which side benefited 
more from the partnership. And with time, 
the merger has become so complete that the 
original duality is not evident. But tracing 
the origins of the threads from which the 
present is spun is exhilarating, for both cells 
and science. One Plus One Equals One is an 
eloquent account, at times verging on the 
poetic. With serious scholarship, it illumi- 
nates a rare scientific endeavour. m 


Nancy A. Moran is professor of integrative 
biology at the University of Texas at Austin. 
e-mail: nancy.moran@austin.utexas.edu 
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Books in brief 


How Not ro 


BE WRONG 


How Not to Be Wrong: The Power of Mathematical Thinking 
Jordan Ellenberg PENGUIN (2014) 

Mathematicians from Charles Lutwidge Dodgson to Steven 

Strogatz have celebrated the power of mathematics in life and the 
imagination. In this hugely enjoyable exploration of everyday maths 
as “an atomic-powered prosthesis that you attach to your common 
sense”, Jordan Ellenberg joins their ranks. Ellenberg, an academic 
and Slate’s ‘Do the Math’ columnist, explains key principles with 
erudite gusto — whether poking holes in predictions of a US “obesity 
apocalypse”, or unpicking an attempt by psychologist B. F. Skinner to 
prove statistically that Shakespeare was a dud at alliteration. 


Starlight Detectives: How Astronomers, Inventors, and Eccentrics 
Discovered the Modern Universe 

Alan Hirshfeld BELLEVUE LITERARY PRESS (2014) 

From 1850 to 1930, a handful of technological adepts transformed 
astronomy. That race to see deep space is told with palpable relish by 
physicist Alan Hirshfeld. Among the brilliant amateurs whose work 
he showcases are William Bond, Harvard University’s ‘astronomical 
observer’, and astrophotographic pioneer Henry Draper. No less 
rousing is Hirshfeld’s rendition of the coda, as Edwin Hubble — 

using the 2.5-metre reflector telescope at Mount Wilson, California — 
discovered the expansion of the Universe and opened up the cosmos. 


Deep: Freediving, Renegade Science and What the Ocean Tells Us 
About Ourselves 

James Nestor HOUGHTON MIFFLIN HARCOURT (2014) 

Freediving, the sport that harnesses the mammalian dive reflex to 
survive deep plunges, can be a boon for marine researchers, avers 
James Nestor. We meet a salty cast of them, such as the “aquanauts” 
of Aquarius, a marine analogue of the International Space Station 
submerged off the Florida Keys. Equally mesmeric are Nestor’s 

own adventures, whether spotting bioluminescent species from a 
submarine in the bathypelagic zone, or freediving himself — and 
voyaging into humanity’s amphibious origins in the process. 


The Collapse of Western Civilization: A View from the Future 
Naomi Oreskes and Erik M. Conway COLUMBIA UNIVERSITY PRESS (2014) 
In Merchants of Doubt (Bloomsbury, 2010), science historians 
Naomi Oreskes and Erik Conway laid out the costs of science 
denialism. In this trenchant sci-fi novella, they carry the 
consequences to their illogical conclusion. A future historian in 

the “Second People’s Republic of China” looks back at the last gasp 
of Western culture in 2093, drowned, burnt and 

broken by climate change, neoliberal-powered ignorance and 
market failure. Packed with salient science, smart speculation 

and flashes of mordant humour. 


Is the Planet Full? 

lan Goldin OXFORD UNIVERSITY PRESS (2014) 

Indefatigable economist lan Goldin follows up The Butterfly Defect 
(Princeton University Press, 2014), on the risks of globalization, with 
this edited volume on the equation of planetary resources and human 
population. Standouts among the agile analyses are lan Johnson’s 
reappraisal of the Club of Rome’s trailblazing 1972 The Limits to 
Growth, in which Massachusetts Institute of Technology researchers 
tackled the same overall question; and Goldin’s discussion of 
governance, ever the elephant in this particular room. Barbara Kiser 
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NIH policy: mandate 
goes too far 


The planned mandate of the 

US National Institutes of Health 
(NIH) to include both sexes 

in effectively all preclinical 
studies could undermine its own 
objective by wasting resources, 
slowing down research or even 
provoking a backlash (see J. A. 
Clayton and F. S. Collins Nature 
509, 282-283; 2014). Instead ofa 
blanket mandate, the NIH should 
be promoting research into the 
sex differences that are important 
to science and in disease. 

Duplicating studies to 
“compare and contrast 
experimental findings in male 
and female animals and cells” 
is rarely practical, affordable, 
prudent, scientifically warranted 
or ethically justifiable. 
Researchers use both sexes 
because this roughly halves 
the costs of breeding and 
maintenance. Sometimes one 
sex is excluded if results are likely 
to differ between sexes, and 
possibly for well-known reasons 
— for instance, male rats run 
faster than female rats through a 
maze. If there is no justification 
for studying both sexes, then it 
should not be done. 

Clayton and Collins suggest 
that statistical variability will 
not be increased by using equal 
numbers of male and female cells 
or animals in studies, but this is 
questionable and undermines the 
premise for the NIH’s argument. 
Ifthe sexes were not different, 
there would be no need to use 
both. Variances are additive, 
so using both sexes halves 
sample size while increasing 
variance, making it less likely 
that an observed difference not 
due to sex can be detected at a 
statistically significant level. Thus, 
an increased number of samples 
would be needed to reach firm 
conclusions. 

Understanding gender 
differences in disease is a goal in 
itself, but this will not be attained 
as a by-product of mandating its 
intrusion into every hypothesis 
under investigation. 


R. Douglas Fields Bethesda, 
Maryland, USA. 
douglas.fields@gmail.com 


NIH policy: status 
quo is also costly 


Researchers have raised concerns 
about the cost of requiring 
applicants for US National 
Institutes of Health (NIH) grants 
to use male and female animals 
or cells in preclinical research 
(see J. A. Clayton and E S. Collins 
Nature 509, 282-283; 2014). But 
they should also consider the costs 
of not taking sex into account: 
these include failed clinical trials, 
misdiagnosis and inappropriate 
therapies for women, and 
omission of fundamental 
biological principles. 

Many researchers are still 
unfamiliar with the distinction 
between sex and gender. Gender 
combines self- and societal 
perceptions of a person’s sex, so 
applies only to humans. Sex is 
the biological result of interplay 
between sex chromosomes and 
gonadal hormones. 

The impact of sex is dynamic, 
changing throughout lifespan 
and in response to injury and 
disease. Ruling out the influence 
of sex ona particular endpoint 
will sometimes be as difficult 
as identifying it. Sex must be 
evaluated in the context of other 
variables, such as age, experience, 
genetics and environment. 

Age-appropriate medicine 
is a well-accepted idea that is 
reflected in the formation of NIH 
centres studying ageing and child 
health. The factor of sex deserves 
an equally integrative approach. 
Louise D. McCullough, Margaret 
M. McCarthy, Geert J. de Vries 
Organization for the Study of Sex 
Differences, Washington DC,USA. 
Imccullough@uchc.edu 


Sharing your data is 
easier than you think 


Geoffrey Goodhill questions 
some of the practicalities of open 
data-sharing policies (Nature 509, 
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33; 2014), but I believe that his 
concerns are largely unfounded. 

Storing large volumes of raw 
data is costly, but many items 
destined for sharing are highly 
processed and relatively small. 
The mouse-brain connectome, 
for example, is available as a 
3-megabyte file derived from 
many gigabytes of raw data (S. W. 
Oh et al. Nature 508, 207-214; 
2014). Neither is there a shortage 
of repositories: many institutional 
databases are freely available 
and well supported (such as 
zenodo.org, maintained by 
CERN, Europe’ particle-physics 
lab in Geneva, Switzerland). More 
repositories will come online as 
researchers learn how to share 
data more effectively. 

Contrary to Goodhill’s 
suggestion, sharing computer 
code does not necessarily 
demand much time investment 
(see, for example, D. C. Ince et al. 
Nature 482, 485-488; 2012). 
Code is a valuable part of a paper, 
so everyone benefits if its authors 
assume from the start that it will 
be shared or reused. Also, people 
releasing code are under no 
obligation to maintain it. 
Stephen Eglen University of 
Cambridge, UK. 
sje30@cam.ac.uk 


Justifying embryo 
research in Europe 


It was a relief last month when the 
European Commission decided 
not to modify legislation on 
research involving the destruction 
of human embryos in response to 
a petition by the One of Us pro- 
life group. Even so, it is time to put 
a stop to this ‘democracy carousel’ 
(see Nature 508, 287; 2014). 

Such citizen campaigns against 
embryo destruction disregard 
the births of more than 5 million 
babies as a result of advances in 
reproductive medicine. Moreover, 
selective abortion following 
an adverse genetic diagnosis 
can often be avoided, owing to 
advances in screening embryos 
before implantation. And 
embryonic stem-cell research 
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is opening up regenerative 
medicine, which may eventually 
provide therapies for conditions 
such as pancreatic failure and age- 
related macular degeneration. 
Central to the debate is the 
ethical status of the human 
embryo between fertilization 
and implantation. Many believe 
that, although a zygote has 
the potential to develop into a 
person, it is not yet a person. On 
this basis, destruction of donated 
embryos for medical research can 
be justified provided the work is 
subject to strict regulation and 
supervision. Indeed, a recent 
(unpublished) study shows that 
donation of spare embryos is 
widely supported by couples 
undergoing in vitro fertilization 
in Europe. 
Joep Geraedts Maastricht 
University, the Netherlands. 
joep.geraedts@mumc.nl 


Still too many 
red-green figures 


People with red-green colour 
blindness cannot interpret 
figures in research papers that 
use these colours. We call for all 
journals to provide alternative 
versions of figures that are more 
accessible to such individuals. 

We searched Nature papers 
published in January-April 2014 
that contained at least one image 
requiring colour discrimination: 
roughly three-quarters used a 
red-green combination. Some 
journals now recommend that 
authors recolour their figures 
— green and magenta, say (see, 
for example, B. Wong Nature 
Methods 8, 441; 2011). 

It would be preferable if 
journals could include a weblink 
to a colour-accessible version 
of red-green figures, and do so 
retroactively for archived figures. 
These would also be useful for 
making slideshows and posters. 
S. Colby Allred, William J. 
Schreiner, Oliver Smithies 
University of North Carolina 
School of Medicine, Chapel Hill, 
North Carolina, USA. 
samuel_allred@med.unc.edu 
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Natural-born killers unleashed 


The finding that phosphoinositide-3-OH kinase 6 restrains the antitumour immune response by promoting the action of 
suppressive immune cells may broaden the applicability of drugs targeting this enzyme to multiple cancers. SEE LETTER P.407 


EMILIO HIRSCH & FRANCESCO NOVELLI 


( ancer cells carry mutations that lead to 
the production of tumour-associated 
antigen molecules, against which the 

immune system can react. In principle, this 
response allows the elimination of cells that 
have undergone cancer-causing changes. But 
tumours often escape this action by tweaking 
immune reactions and by promoting activities 
usually associated with the resolution of these 
responses. Typically, this involves the cancer 
cells influencing immune cells that have a reg- 
ulatory function, such as regulatory T cells and 
myeloid-derived suppressor cells. On page 407 
of this issue, Ali et al.’ report that, to escape the 
immune response, tumours require the action 
of the intracellular enzyme phosphoinositide- 
3-OH kinase 5. They also demonstrate that 
inhibiting this enzyme blocks the suppres- 
sive activity of immune-regulatory cells, 
thus strengthening the anticancer immune 
response. 

Phosphoinositide-3-OH kinases (PI(3)Ks) 
are involved in the intracellular amplification 
of extracellular cues and, therefore, in many 
cellular functions, including proliferation, 
migration and metabolic control’. Mutated 
forms of PI(3)Ks often directly drive cancer 
formation, and their function in cells sur- 
rounding a tumour might also indirectly 
favour cancer growth’. PI(3)K6, the p1106 
isoform of PI(3)K, is primarily expressed 
by white blood cells (leukocytes), including 
lymphocytes (the class to which T and B cells 
belong) and myeloid cells. For example, 
PI(3)K6 functions in lymphocyte proliferation 
and migration’, and is implicated in cancers 
of these cells, such as chronic lymphocytic 
leukaemia and indolent B-cell lymphomas. 

Recently developed drugs that inhibit 
PI(3)K6 have shown particular effectiveness in 
the treatment of lymphocyte cancers and are 
expected to be approved by governmental regu- 
latory agencies soon’. Nonetheless, the leuko- 
cyte-specific expression of PI(3)K6 means that 
these drugs have been expected to be ineffec- 
tive against solid tumours. Ali and colleagues’ 
findings challenge this concept, widening the 
clinical application of PI(3)K6-targeting drugs 
and suggesting that they may be used to sup- 
port cancer immunotherapy — an increasingly 
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Figure 1 | Inhibition of PI(3)K6 releases immune suppression. a, Immune-suppressive T cells, called 
regulatory T cells (T,,, cells), block the recognition and elimination of tumour cells by cytotoxic CD8* 


T cells. b, Ali et al.' show that when both T. 
the activity of the enzyme PI(3)K6, T, 
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and cytotoxic CD8* T cells carry a mutation that inhibits 
-cell-mediated immune suppression is released. Although the 


cytotoxic CD8*" T cells themselves require P1(3)K6 for their tumour-killing activity, the release of 
T,.,~Cell suppression is sufficient to induce a heightened anticancer response. c, Accordingly, when 
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PI(3)K6 is inhibited only in T,,, cells, the anticancer effect mediated by cytotoxic CD8" T cells is maximal. 


successful anticancer strategy that aims to 
boost immune responses against tumour cells®, 

These immune responses are driven by the 
CD8* class of T cells. The cell-killing activity 
of these cells can be restrained by immune- 
suppressive CD4" regulatory T cells (T,,. cells), 
which are normally responsible for maintain- 
ing T-cell tolerance of self-antigens, prevent- 
ing autoimmune reactions and dismantling 
immune responses, for example after suc- 
cessful elimination of a pathogen’. However, 
T 22 cells also inhibit CD8* T-cell-mediated 
killing of cancer cells, thus representing a 
major obstacle to cancer immunotherapy. 

Ali and colleagues show that the growth and 
metastatic spread of different types of tumour 
transplanted into mice in which PI(3)K6 is 
genetically inactivated is significantly inhib- 
ited compared with the tumours’ behaviour in 
normal mice. Because PI(3)K6 is not expressed 
by the tumour cells themselves, this effect 
seems to be linked to a disturbed suppres- 
sion of anticancer immunity. The researchers 
demonstrate that PI(3)K6 activity is required 
for the proliferation and differentiation of sup- 
pressive T,., cells induced by tumour cells, and 
that deletion of PI(3)K6 only in the T,,,-cell 
population blocks tumour growth and pro- 
longs the survival of mice after inoculation 
with cancer cells. 
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Previous studies have suggested that 
PI(3)K6 functions in cells other than T,,, cells. 
For example, a lack of PI(3)K6 activity reduces 
the differentiation of cytotoxic CD8" T cells, 
thus, paradoxically, reducing tumour-cell kill- 
ing®. However, Ali et al. show that the effect 
of PI(3)K6 inhibition is stronger on T,,, cells 
than on CD8" T cells, and that blocking T,,.,.- 
cell-dependent inhibition of CD8* T cells over- 
comes the reduction in CD8* T-cell-mediated 
cytotoxicity (Fig. 1). 

Teg Cells are not alone in inhibiting the cyto- 
toxic immune response — other leukocytes, 
including polymorphonuclear myeloid- 
derived suppressor cells (PMN-MDSCs), are 
stimulated by tumour cells to block cytotoxic 
CD8* T-cell function. Ali and colleagues show 
that PI(3)K6 inhibition not only reduces the 
number of T,,, cells but also impairs the func- 
tion of PMN-MDSCs to limit T-cell prolif- 
eration. These synergistic effects explain the 
greatly enhanced anticancer immune reaction 
observed in mice lacking active PI(3)KS. 

To assess whether these observations could 
be of therapeutic value, the authors investi- 
gated pharmacological inhibition of PI(3)K6 
in a model of spontaneously developed cancer 
in mice, focusing on one of the most deadly 
cancers, pancreatic ductal adenocarcinoma. 
Strikingly, they found prolonged survival in 


these animals, as a result of the expected block- 
ade of T,.,-cell activity and the accumulation of 
tumour-killing CD8* T cells in the pancreas. 

Although these results highlight the poten- 
tial of PI(3)K6 inhibition in cancer therapy, 
they do not clarify how PI(3)K6 inhibition 
blocks the immune-modulating activity of 
Teg Cells. Clues may come from Ali and col- 
leagues’ finding that PI(3)K6 inhibition 
impairs PMN-MDSC production of soluble 
messenger molecules that cause immune sup- 
pression and tumour growth. But how this 
occurs is still unclear, and future studies should 
more precisely define the intracellular mech- 
anisms by which PI(3)K6 controls immune 
suppression. 

This strong proof of concept that PI(3)K6 
inhibition can boost the immune response 
against a wide range of tumours will probably 
fuel the already-intense testing of drugs that 


APPLIED MATHEMATICS 


target this enzyme. Early clinical trials show 
that PI(3)K6 inhibitors, such as idelalisib, are 
effective in and tolerated by humans®. How- 
ever, little is known about the long-term effects 
of PI(3)K6 inhibition and whether chronic 
treatment is sufficiently safe. For example, 
more work is required to exclude the possibility 
that long-term blockade of immune suppres- 
sion leads to unwanted lymphocyte-medi- 
ated responses, such as autoimmunity. But 
if safety is demonstrated, PI(3)K6 inhibition 
to unleash our spontaneous predisposition to 
eliminate cancer cells may become a valid 
option for cancer treatment. = 
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How chaos forgets 
and remembers 


“It is difficult to make predictions, especially about the future,” goes the proverb. 
Astudy of the dynamics of chaotic systems in the context of information theory 


adds a twist to this saying. 


P.-M. BINDER & R. M. PIPES 


he chaos revolution is now more 
than 50 years old’. Long anticipated 
by mathematicians, beginning with 
Henri Poincaré in the 1880s, the field finally 
emerged on the science stage in the early 1960s 
following the identification of chaotic behav- 
iour in computer simulations of atmospheric 
and astronomical systems. Since then, many 
experimental observations of chaotic dynam- 
ics (such as in fluids, electric circuits, lasers 
and insect populations) and parallel theoreti- 
cal developments have transformed the field 
into a fully fledged area of research. The central 
tenet of chaos is that simple deterministic sys- 
tems — those in which the past uniquely deter- 
mines the present and the present pegs down 
the future — can display behaviour that seems 
random. Writing in Physics Letters A, James 
and co-workers’ present a study that adds a 
layer of subtlety to this statement: they show 
that measurements on a deterministic system 
that evolves in time may contain information 
that is specific to its past, present and future. 
In the early days of nonlinear science and 
chaos, several branches of knowledge from 
within mathematics, physics and computer 
science helped to create a terminology that 
has become the lingua franca of a broad 


community of scientists from the physical, life 
and social sciences. Geometry and informa- 
tion theory stand out among these contribu- 
tors. The first interprets the dynamics of a 
chaotic system as motion in an abstract space 
of system states, leading to the characterization 
of beautiful and elegant swirling trajectories 
called strange attractors. The second, inspired 
by the work of physicist Ludwig Boltzmann 
and mathematician Claude Shannon, focuses 
on calculations based on the probabilities of 
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specific realizations of a chaotic system. A 
central finding of the latter approach is that 
information is produced by chaotic systems 
as they evolve in time. Strong connections 
between geometry and information theory 
have helped to unify the field of chaos (see, for 
example, ref. 3). James and colleagues’ study is 
based on information theory. 

In principle, a chaotic system is as predict- 
able as clockwork, although much less regular. 
In practice, the famous butterfly effect* ampli- 
fies exponentially into the future any uncer- 
tainty about the initial state of a dynamical 
system. But what James et al. found is deeper. 
Working within the larger context of how to 
infer models from data, they selected sev- 
eral examples of one- and two-dimensional 
chaotic systems for which a measurement can 
be made as coarse as possible — taking a value 
of either zero or one — through a technique 
called symbolic dynamics*. Consequently, 
records of the dynamics of these systems are 
represented as strings of zeros and ones. The 
authors then performed information-based 
calculations on these strings that allowed them 
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Figure 1 | Chaos measurements in time. A real-life chaotic system, such as an electric circuit with an 
alternating-current power supply and several nonlinear elements (left), produces a time series of states 
(middle) that may not be fully accessed by an experimentalist but can be codified as a binary sequence 
of zeros and ones (right). The most salient regularity of the sequence produced in this case is that no 
two consecutive zeros occur. In accordance with James and colleagues’ results’, the binary outcomes 
of measurements of this system provide, on average, some knowledge of the past and the future of the 
system. Outcomes of ‘I’ allow previous information about the system to be forgotten, and ‘0’ outcomes 
guarantee that both the previous and subsequent measurements are ‘1’ 
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to correlate a present measurement to its past 
and future. They found that, typically, some 
of the information measured in the present 
comes from the past (redundant or predicted) 
and the rest is newly created. Focusing on the 
created information, they further found that 
some of it (ephemeral) does not carry into the 
future — in other words, it is readily forgotten, 
with the rest (bound) being remembered and 
carried into the future. This fine-graining of 
information sheds new light on how chaotic 
processes work. 

To illustrate these concepts, consider a real- 
life chaotic system, such as an electric circuit, 
for which coarse measurements can be made 
but the true states of which are inaccessible 
(Fig. 1). In this example, any sequence with 
two consecutive zeros cannot happen. Without 
performing any calculations, some manifesta- 
tions of the types of information identified in 
James and co-workers’ study can be gleaned. A 
sequence ‘01’ is an example of redundant infor- 
mation, because the zero always implies a one. 
A measurement of ‘1’ can be preceded by either 
‘0’ or ‘1’; this exemplifies ephemeral informa- 
tion, because what came before it becomes 
irrelevant and is forgotten. Finally, a measure- 
ment of ‘0’ carries bound information because 
the system remembers and evolves to ‘1’ 

James and colleagues’ results show how the 
past and future of an evolving chaotic system 
become intertwined with its present. This 
feature may be at the heart of one of the most 
enigmatic of physical principles: the second 
law of thermodynamics, which states that the 
entropy of an isolated system never decreases 
with time. The statistical, irreversible charac- 
ter of this law is at odds with the underlying 
deterministic and reversible dynamics of such 
isolated systems at the microscopic level®’. 
The idea of applying the information-based 
methods presented here to thermodynamic 
systems, such as collections of gas molecules, 
is promising. Considering the entropy of such 
a collection as a property of its state might lead 
to insight into the ‘arrow of time’ in the second 
law, especially because, as James and co-work- 
ers show, chaos both forgets and remembers. = 
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Immaturity in the gut 
microbial community 


Undernourished children fall behind not only on growth, but also on maturation 
of their intestinal bacterial communities, according to a study comparing acutely 
malnourished and healthy Bangladeshi children. SEE LETTER P.417 


ELIZABETH K. COSTELLO & DAVID A. RELMAN 


ffective assessments of child growth 
B=: on a knowledge of underlying 

processes, appropriate standards and 
accurate measurements. These elements 
form a comparative framework with which 
trajectories can be charted and developmen- 
tal milestones marked, providing ‘actionable 
intelligence’ on health and disease in individu- 
als and populations. In this issue, Subramanian 
et al.’ (page 417) chart a different path — one 
in which the milestones are microbial — for 
young children living in the Mirpur urban 
slum of Dhaka, Bangladesh, many of whom 
suffer from undernutrition (Fig. 1). The 
authors find evidence for delays in the devel- 
opment of gut bacterial communities in 
acutely malnourished children compared with 
healthy children, and that these delays are only 
fleetingly ameliorated by standard treatment. 
The team’s approach for classifying and track- 
ing gut microbiota may enhance assessments 
of childhood health and development, and 
improve therapeutic strategies. 

Growth faltering in early childhood is the 
hallmark of undernutrition, a pervasive condi- 
tion in the developing world that arises from 
insufficient intake, absorption or assimila- 
tion of nutrients. Undernutrition results from 
scarce and nutrient-poor food, poor-quality 
water and unsanitary living conditions. Recur- 
rent bouts of gastrointestinal infections exac- 
erbate and perpetuate the problem”. Positive 
feedback loops can ensue both in individu- 
als, when intestinal damage reinforces poor 
growth and susceptibility to infection, and over 
generations, when maternal undernutrition 
causes undernutrition in children. Over time, 
these cycles can impair learning, limit pro- 
ductivity and ultimately perpetuate poverty. 
Maternal and child undernutrition were a fac- 
tor in 3.1 million (45% ofall) deaths in children 
under 5 years of age in 2011°. Children under 
2 are particularly vulnerable to undernutrition 
(and infection), but are also the most respon- 
sive to treatment’. 

In early childhood, assembly of gut micro- 
bial communities results from the sequential 
arrival of taxa from external sources and the 
extinction of taxa already present, in part in 
response to age-associated events such as 
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weaning. Progression towards an adult-like, 
‘mature’ state occurs over the first 2-3 years of 
life. These postnatal events prompt the termi- 
nal maturation of intestinal structures, stimu- 
late immune responses and provide resistance 
to invasion by pathogens; aberrant or delayed 
assembly is associated with altered metabo- 
lism and immune function. Thus, because 
they affect and are affected by similar factors, 
undernutrition and gut-microbiota develop- 
ment are closely intertwined. Unravelling 
the two and discerning the role of host 
and environmental factors are daunting but 
important goals’, 

Anthropometric indicators — physical 
measurements, such as weight for height, 
which are scored relative to a reference popu- 
lation — are indispensable tools in the assess- 
ment and treatment of undernutrition. Not 
surprisingly, equivalent international stand- 
ards for gut-microbiota development are not 
yet available: few individuals have been fol- 
lowed in detail over the requisite time frame, 
and it also seems that microbiota composition 
in early childhood differs across populations”. 
In light of this, Subramanian and colleagues 
examined healthy and malnourished children 
from the same urban area, ostensibly mini- 
mizing genetic and environmental differences 
between the two groups. 

To derive a model of gut-microbiota devel- 
opment in the Bangladeshi children, the 
researchers collected faecal samples from 
50 well-nourished subjects at monthly inter- 
vals over the first 2 years of life. Next, they 
surveyed the bacterial communities in the 
samples by sequencing 16S ribosomal RNA 
genes, which are used to define and enumerate 
bacterial taxa. The taxa from 12 of the subjects 
were then assessed and ranked according to 
their ability to discriminate between different 
host ages. The authors found that the 24 most 
age-discriminatory taxa could predict the ages 
of the remaining 38 healthy children from their 
gut microbial composition. 

In keeping with the tradition of anthropo- 
metric indicator scores, the authors defined 
two indicators of gut-microbiota maturation: 
relative microbiota maturity and a microbiota- 
for-age Z-score (MAZ). The overall gist is this: 
if the model classifies your gut microbiota as 
that of a 6-month-old when you are actually 
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ease 


Figure 1 | Children from the Mirpur slum of Dhaka, Bangladesh. 


18 months old, then your gut microbiota is 
probably ‘immature’ — its composition looks 
‘younger’ than that of most healthy people of 
your age (although it may be different in other 
ways, too). Applying these indicators to their 
well-nourished cohort, the authors found 
that microbiota maturity decreased during 
diarrhoeal episodes, increased with infant- 
formula consumption, was unchanged by 
recent antibiotic use and was correlated among 
family members. 

Subramanian and colleagues then applied 
their microbiota-maturation indices to 64 chil- 
dren aged 6-20 months at the start of the study 
who were sampled during and after inpatient 
treatment for severe acute malnutrition. The 
children were participating in a randomized 
trial comparing two therapeutic foods, in com- 
bination with supportive therapy that included 
antibiotics. Compared with healthy children, 
the malnourished children showed significant 
microbiota immaturity during treatment, 
regardless of treatment group. Notably, in the 
2-3 months following treatment, the children’s 
microbiota-maturation scores improved sig- 
nificantly; however, after this period, much of 
this catch-up maturation was lost. These pat- 
terns mirrored the anthropometric outcomes 
of the study: although they gained weight 
initially, children in both groups remained 
severely underweight compared with healthy 
children at the end of the follow-up period. 
The results also support previous studies of 
undernutrition in humanized mouse models’. 

Degraded ecosystems are notoriously dif- 
ficult to restore. Often, such efforts focus on 
restoring environmental conditions (akin to 
the food intervention in Subramanian and 
colleagues’ study) and eliminating unwanted 
species (akin to the antibiotic therapy), then 


waiting for assembly processes to play out 
‘naturally’ to restore the desired community’. 
But degraded communities can be resistant or 
resilient to change*”, and although host health 
can be restored, youth cannot. The composi- 
tion of mature communities may depend on 
the timing and order of earlier species intro- 
ductions (and extinctions) '° and may prove 
difficult to reconstitute (by the use of probiot- 
ics, for example). Thus, an ounce of prevention 
is likely to be worth a pound of cure and, as 
with other types of developmental delays, early 
intervention may be crucial. 

The approach presented by Subramanian 
et al. could be used to develop standards across 
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the globe, and then to monitor gut coloniza- 
tion during early childhood, as an early-warn- 
ing system for microbiotas that are falling ‘off 
track’ (and there may be many such tracks 
to health). A detailed analysis of microbiota 
maturation in well-nourished populations 
will complement this work, and allow further 
deconvolution of some of the common micro- 
biota insults that were unavoidably layered and 
repeated in the current study. It is becoming 
clear that recognizing which features of micro- 
biota assembly are associated with health, and 
understanding whether and how healthy com- 
munities bounce back after disturbance, are 
key requirements for future human-develop- 
ment roadmaps. = 
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Powered by magic 


What gives quantum computers that extra oomph over their classical digital 
counterparts? An intrinsic, measurable aspect of quantum mechanics called 
contextuality, it now emerges. SEE ARTICLE P.351 


STEPHEN D. BARTLETT 


or decades, researchers have struggled 
with the question of what makes quan- 


tum computers so powerful, and the 
answer has been as elusive as an understanding 
of quantum physics itself. Is there some unique 
feature of quantum physics that is responsible 
for enabling quantum computers to perform 
certain computations faster than their conven- 
tional digital counterparts? Many of the more 
exotic properties of quantum mechanics have 
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been put forward as possible candidates, but so 
far none has held up to scrutiny. On page 351 
of this issue, Howard et al.’ uncover a remarka- 
ble connection between the power of quantum 
computers and one of the stranger properties 
of quantum theory known as contextuality. 
Designs for quantum computers often mir- 
ror those of conventional computers, in that 
they are built out of basic components such as 
logic gates that perform elementary operations 
on quantum bits of information. A commonly 
used set of operations for a quantum processor 
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is known as the stabilizer operations”. These 
operations are designed using the rules of 
quantum physics, but are in many ways simi- 
lar to those used by a classical machine. For 
example, initializing quantum bits to a value 
of 0 or 1, reading out these binary values or 
flipping them are all stabilizer operations 
(as well as more-exotic ones). In fact, within 
this limited set of building blocks, it is often 
possible to imagine that the quantum bits are 
simply described by pairs of bits (0’s and 1’s) 
that are initialized, processed and measured 
by stabilizer operations much like the bits in 
a digital computer’. The restricted ‘classical’ 
nature of the stabilizer operations lets quan- 
tum engineers design error-correcting codes 
and logic gates that are tolerant when things 
go wrong. 

Any quantum machine that computes 
using only these stabilizer operations is no 
more powerful than your desktop com- 
puter”®. So how can we supplement this set to 
build a quantum computer? There are several 
approaches, but by far the most common is to 
provide the computer with a large number of 
additional quantum bits that are initialized in 
a peculiar way, using a quantum superposi- 
tion of the usual stabilizer initializations’. A 
quantum bit described by a superposition pos- 
sesses characteristics of both binary values 0 
and 1 simultaneously. This way of initializing 
the quantum bits is called magic — a rather 
suitable name for some of the quantum weird- 
ness that contradicts our everyday experience. 
Supply a processor that uses only stabilizer 
operations with quantum bits initialized as 
magic states and — hey presto! — that limited 
machine is endowed with the full power ofa 
quantum computer. 

Ifa quantum computer that uses only stabi- 
lizer operations is stuck in the slow lane together 
with today’s run-of-the-mill digital computers, 
but can be ‘boosted’ to a powerful quantum 
computer by being supplied with magic states, 
then these magic states must hold the key to the 
quantum computer's increased performance. 
So what is so special about magic states? The 
answer provided by Howard et al. comes 
from studying how these states might also be 
described using pairs of bits for each quantum 
bit, as we could for stabilizer operations. The 
authors formalize this perspective by using a 
non-contextual hidden-variable theory, which 
is a way of describing the properties of a quan- 
tum particle or device using the values (hidden 
variables) of a number of bits. The non-contex- 
tuality comes from the desire to have these bits 
take consistent values throughout the compu- 
tation, regardless of when and how we might 
hypothetically take a peek at their values (the 
context in which we measure the bits). 

We have long known that not all of quantum 
physics can be described by a non-contextual 
hidden-variable theory, and there are experi- 
mental tests that can be used to prove that 
quantum systems are contextual and so evade 


any possible classical description. In their 
study, Howard and colleagues show that what 
makes magic states special is precisely their 
contextuality. Specifically, they find that magic 
states possess exactly the properties needed to 
prove that quantum physics is contextual using 
an experimental test that relies only on stabi- 
lizer operations. That is, the authors demon- 
strate that this particular measurable aspect of 
quantum weirdness — contextuality — is the 
source of a quantum computer's power. 

A few curious details remain unresolved. 
First, there are some subtleties that limit what 
these results can say about quantum bits — 
the most elementary quantum systems — as 
opposed to larger quantum systems. The limi- 
tations could simply be a vagary of the proof 
technique used by the authors, or could bea 
hint of something deeper. There also remain 
some unanswered questions regarding the 
power of states with vanishingly small amounts 
of magic. And finally, does contextuality power 
other quantum-computing architectures that 
supplement stabilizer operations in other ways 
than supplying magic states, such as those 
designed around quantum measurements*? 
Further refinements of the possible tests of 
contextuality to the most general situations 
could clarify these outstanding issues. 
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Knowing that contextuality supplies the 
magic for quantum computers is much more 
than a satisfying connection. This find- 
ing also promises to help researchers design 
better architectures for quantum machines. In 
many of the most sophisticated models for a 
potential quantum computer, just manipulat- 
ing magic states into a usable form consumes 
most of the processor time. New architectures 
that are thriftier in their use of this contextu- 
ality resource may be much easier to build. = 
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Balancing act 


The enzyme parkin is known to promote disposal of organelles called mitochondria 
that have suffered damage. The identification of an enzyme that opposes parkin 
demonstrates how a delicate balance is maintained in the cell. SEE ARTICLE P.370 


ALBAN ORDUREAU & J. WADE HARPER 


ells have a love-hate relationship with 

mitochondria. As the power plants 

of cells, these organelles provide the 
energy required for life, but mitochondrial 
defects can lead to the production of reactive 
oxygen species that disrupt crucial cellular 
functions. Cells therefore use a specialized 
program, mitophagy, to eliminate damaged 
mitochondria and so maintain cellular health. 
Although a mitophagy signalling pathway 
comprised of two enzymes, PINK] and parkin, 
has been identified, it is not clear what factors 
inhibit the pathway. In this issue, Bingol et al.' 
(page 370) report that USP30, a deubiquitinat- 
ing enzyme, puts the brakes on mitophagy. 

In cells with healthy mitochondria, parkin 
is located in the cytoplasm and is thought to 
be inactive”*, whereas PINK1 is associated 
with mitochondria. Activation of PINK] in 
response to mitochondrial damage causes 
migration of parkin, a ubiquitin ligase, to the 
outer membrane of the mitochondrion, and its 
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subsequent activation by PINK] (refs 2, 4, 5). 
Activated parkin then transfers a small protein 
called ubiquitin to one or more lysine amino- 
acid residues on dozens of proteins bound to 
the mitochondrial outer membrane”. Follow- 
ing this ubiquitination process, the ubiquitin 
tags are recognized by the cell’s mitophagy 
machinery’, leading to mitochondrial degra- 
dation. Defects in mitochondrial quality con- 
trol, brought about by mutations in PINK] 
and parkin are the cause of certain neuro- 
degenerative disorders, such as some early- 
onset familial forms of Parkinson's disease’. 
The pathways downstream of ubiquitination 
at the mitochondrial outer membrane are far 
from clear, but specific ubiquitinated targets 
and the total number of ubiquitin modifica- 
tions on target proteins have been offered 
as possible factors in the recruitment of the 
mitophagy machinery to mitochondria””’. 
Protein ubiquitination is a reversible modifi- 
cation — indeed, the human genome encodes 
more than 100 deubiquitinating enzymes. 
Bingol and colleagues therefore reasoned that 
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inhibiting deubiquitination of 
parkin targets could be a tool to ‘A 
restore the balance of mitophagy 
in cells in which the PINK1- 
parkin pathway is defective 
(Fig. 1). The authors induced 
mitophagy experimentally in 
cells by triggering mitochon- 
drial depolarization, which 
activates PINK1, and performed 
a screen for deubiquitinating 
enzymes that could prevent 
parkin-dependent mitophagy. 
This quest resulted in the iden- 
tification of USP30. 

The authors found that 
USP30 not only prevented the 1 


Cellular health 


Healthy 
neuron 


Unhealthy 
neuron 


mitochondrial health in cells with 
other types of defect, by reducing 
the threshold for PINK1-par- 
kin-dependent mitochondrial 
clearance. 

Precisely why dopamine- 
producing neurons are more 
sensitive to familial mutations 
in the PINK1-parkin pathway 
than other cells in the body is 
unclear. This work raises the 
interesting possibility that rela- 
tive levels of USP30 and par- 
kin in various cell types could 
determine the sensitivity of 
cells to defects in this path- 
> way. However, a recent report” 


mitophagy machinery from 
recognizing damaged mito- 
chondria, but also reversed the 
accumulation of ubiquitin on 
proteins bound to the mito- 
chondrial outer membrane, 
indicating that USP30 directly 
opposes parkin function. Ina 
cell-wide analysis of ubiquitin- 
tagged proteins, Bingol and co- 
workers identified 41 targets of 
parkin ubiquitination that could 
be deubiquitinated by USP30, 
including TOM20, a subunit of the mitochon- 
drial translocase enzyme, which is responsi- 
ble for transport of proteins across the outer 
mitochondrial membrane. Surprisingly, the 
‘classic parkin target protein, mitofusin, was 
resistant to USP30-driven deubiquitination. 
An understanding of how USP30 selectively 
removes ubiquitin from some but not other 
parkin targets will require further work, but 
it is conceivable that the proteins that are not 
targeted by USP30 are simply those that are 
most efficiently ubiquitinated. 

Although the PINK1-parkin pathway is 
known’ to promote mitophagy in response 
to chemical or genetic disruption in cell-cul- 
ture samples, its role in neurons — which are 
critically affected in Parkinson's disease — has 
been controversial*. In an experimental tour 
de force, Bingol et al. examined USP30 and its 
role in mitophagy in cultured rat neurons and 
in fruit flies genetically engineered to model 
Parkinson’s disease. 

By tracking mitochondria in rat neurons as 
they underwent mitophagy in ‘normal’ situa- 
tions (without the need to artificially activate 
the pathway), the authors demonstrated that 
levels of mitophagy were reduced by loss of 
PINK1 and parkin, and increased by deple- 
tion of USP30. Thus, USP30 opposes PINK1- 
parkin-dependent mitophagy in healthy 
neurons, in which defective mitochondria 
probably arise as a result of the oxidative stress 
that occurs during normal cellular function. 

Fruit flies that model Parkinson's disease 
have defective mitochondria in flight-muscle 
cells, a reduced ability to climb and reduced 
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Figure 1 | Maintaining balance in mitophagy. A graphical representation of 
how cellular health can be affected by changes in the levels of mitophagy — the 
process by which cells dispose of mitochondria that have become defective or 
damaged. Insufficient or excessive mitophagy reduces cellular health, owing to 
accumulation of defective mitochondria or disposal of too many mitochondria, 
respectively. PINK1-parkin signalling promotes mitophagy, and Bingol et al.' 
now find that an enzyme, USP30, opposes the action of the parkin enzyme and 
inhibits mitophagy (not shown). Thus, in cells that express both parkin and 
USP30, a balanced level of mitophagy is maintained. 


levels of the neurotransmitter dopamine’””. 
Bingol and colleagues found that these defects 
were largely reversed when USP30 was 
removed throughout the animal. To examine 
the dopamine-producing neurons that are 
affected by Parkinson's disease, the authors 
genetically deleted USP30 in these cells spe- 
cifically, and treated the insects with paraquat, 
a mitochondrial toxin that elicits Parkinson’s- 
disease-like symptoms in humans, and reduces 
dopamine levels in fruit flies. Depletion of 
USP30 in dopamine-producing neurons 
largely reversed dopamine loss and behav- 
ioural defects, and increased survival, indi- 
cating that USP30 might actively oppose the 
PINK1-parkin pathway in cell types affected 
by Parkinson's disease. 

This study may have implications for 
the treatment of defective PINK1-par- 
kin signalling in Parkinson's disease, and 
gives us a deeper understanding of this 
form of mitophagy in general. Inhibitors 
of USP30 might increase mitochondrial 
health under conditions in which this sys- 
tem is impaired, for example in patients with 
mutations in PINK1 or PARKIN. Indeed, 
Bingol and co-workers found that depletion 
of USP30 increased mitophagy in human 
cell lines altered to express a mutant form 
of PARKIN. 

If this activity can be extended to neurons, 
it is possible that USP30 inhibitors could be 
beneficial to patients. The development of 
effective USP14 inhibitors’ suggests that 
selective targeting of this enzyme class is a pos- 
sibility. USP30 inhibitors might also improve 
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suggests that a different family 
member, USP 15, can similarly 
reverse the PINK1-parkin 
pathway. Further studies will 
be needed to understand the 
relationship between these two 
pathways. 

Finally, Bingol and colleagues 
provide evidence that ubiquit- 
ination of TOM20 is required 
for mitophagy, providing a new 
potential link between mito- 
chondria and the mitophagy 
machinery. Exploring whether TOM20 
ubiquitination promotes assembly of the 
mitophagy machinery on mitochondria may 
help us to understand what continues to be 
a central puzzle in the field: the mechanism 
by which ubiquitinated mitochondria are 
recognized by autophagosomes, the vesicles 
that transport damaged mitochondria to 
be degraded’. Now the race is on to deter- 
mine whether releasing the parkin brake will 
benefit patients. m 
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QUANTUM PHYSICS 


Feel the force 


An approach based on quantum sensing, in which controlled quantum systems 
serve as precision sensors, has enabled measurement of the weak magnetic 
interaction between two electrons bound to two separate ions. SEE LETTER P.376 


FERDINAND SCHMIDT-KALER 


hat electrons have negative charge and 

repel each other are concepts famil- 

iar to most readers. Less well known 
is that they also possess a magnetic moment 
associated with their spin, and therefore exert 
magnetic forces on one another. However, 
interactions between individual electron 
spins have hitherto not been measured. This 
is mainly because they are dwarfed by other 
effects: for small, atomic-scale separation 
between electrons, the Pauli exclusion prin- 
ciple, which states that two electrons cannot 
occupy the same quantum state, and the Cou- 
lomb electric interaction dominate; and for 
large separation, the strength of the magnetic 
interaction is vastly reduced and typically fully 
masked by the force that an electron’s magnetic 
moment experiences in an ambient fluctuat- 
ing magnetic field. On page 376 of this issue, 
Kotler et al.' report how they have succeeded in 
detecting the minuscule magnetic interaction 
between two electrons bound to two ions sepa- 
rated by about 2 micrometres, using ideas from 
the emerging field of quantum sensing. 

The quantum states of single photons, atoms 
and ions, and of impurity ions in crystals, can 
be controlled almost perfectly in the labora- 
tory. Initially, the development of experimen- 
tal techniques to control and manipulate such 
quantum states was motivated by an interest in 
testing the fundamental principles of quantum 
physics. Nowadays, advances in quantum-state 
manipulation are also targeted towards appli- 
cations such as quantum computing, quantum 
simulation and quantum sensing. Whereas 
quantum computing and simulation require 
exquisite control of interactions between large 
numbers of quantum particles, sensing appli- 
cations, in which quantum systems serve as 
sensing devices, are much less demanding in 
that regard. 

In their study, Kotler et al. control and mani- 
pulate the valence electrons of two strontium 
ions confined in an electrical device known as 
a Paul trap. To understand their work, think of 
the spin of a strontium ion’s valence electron 
as a tiny magnet with a north and a south pole 
— like the needle of a magnetic compass — 
that aligns with external magnetic fields. But 
imagine what happens if two such compass 
needles are placed close to each other. Now, 
one needle may interact with the other and 
rotate slightly, depending on the orientation of 


the other needle. It is exactly this small effect 
— the magnetic interaction of two single spins 
— that Kotler and colleagues measured in their 
experiment. 

To perform their measurements, the 
researchers first used laser pulses to cool 
the ions and initialize them such that the 
magnetic moments of the valence electrons 
pointed in opposite directions. Returning to 
the magnetic-compass analogy, both south 
poles are now facing and repelling each other 
(Fig. 1a). In addition, the interactions of the 
two electrons with a uniform external mag- 
netic field — which should be eliminated in 
order to measure the tiny spin-spin interac- 
tion strength — are balanced out, because they 
are of the same magnitude but opposite sign. 
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Figure 1 | Measuring magnetic forces. Kotler 

et al.' have measured the tiny interaction between 
two electron spins bound to two strontium ions 
about 2 micrometres apart. a, The electron spins 
of both ions can be illustrated by magnetic- 
compass needles, here aligned opposite to each 
other; blue indicates a south pole and red a 

north pole. b, By repeatedly flipping the needles’ 
directions rapidly and simultaneously, the authors 
could cancel out the interactions of the ions 

with a fluctuating external magnetic field (not 
shown). c, This allowed them to measure the 
magnetic interaction between both needles from 
the rotation that they undergo in the previous 
configurations. 
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Next, the directions of both compass needles 
were rapidly and continuously flipped simulta- 
neously (Fig. 1b). This step helped to compen- 
sate for fluctuations of the external magnetic 
field, which are different in strength at the two 
ion locations. On average, any interactions of 
the ions with the fluctuating external mag- 
netic field essentially vanished, and Kotler and 
co-workers were set to measure the spin-spin 
interaction strength. 

In the two configurations described 
above (south poles or north poles facing one 
another), the spin-spin interaction causes 
the magnetic moments to repel each other 
and start turning (Fig. 1c). In the experi- 
ment, the magnetic moments turn but do so 
in a ‘coherent quantum superposition. This is 
a neat quantum trick, in which the electrons’ 
magnetic moments are forced to align with 
each other and eventually become quantum 
entangled. By measuring the properties of 
this carefully designed state, which is immune 
to magnetic noise and has a lifetime of almost 
1 minute, the authors could measure the rota- 
tion of the moments and thus the spin-spin 
interaction strength. Owing to the extremely 
small strength of the interaction and associated 
rotation rate (only 0.0009 hertz), the authors 
had to wait 15 seconds before they could deter- 
mine the rotation. 

Kotler and colleagues’ study has broad 
ramifications for quantum sensing. The experi- 
mental sequence adopted by the authors may 
be readily applied to other atomic systems, as 
well as to molecular, optical and solid-state 
systems, with the prospect of using them as 
sensitive magnetic probes. The approach may 
be relevant for developing clocks based on 
trapped ions or atoms’ and for sensing small 
interactions in hybrid systems, such as mixtures 
of cold ions and atoms**. Applying the tech- 
nique to solid-state systems will be particu- 
larly interesting, because these systems offer 
prospects for commercial applications such as 
magnetic sensors operating in ‘noisy’ environ- 
ments. Magnetic sensors based on single-nitro- 
gen-atom impurities in diamond have already 
been produced” and are close to reaching the 
sensitivity needed to detect a single nuclear 
spin. Physicists await further advances in 
quantum sensing with great interest. m 
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Contextuality supplies the ‘magic’ for 
quantum computation 


Mark Howard!?, Joel Wallman?, Victor Veitch? & J oseph Emerson? 


Quantum computers promise dramatic advantages over their classical counterparts, but the source of the power in 
quantum computing has remained elusive. Here we prove a remarkable equivalence between the onset of contextuality 
and the possibility of universal quantum computation via ‘magic state’ distillation, which is the leading model for exper- 
imentally realizing a fault-tolerant quantum computer. This is a conceptually satisfying link, because contextuality, which 
precludes a simple ‘hidden variable’ model of quantum mechanics, provides one of the fundamental characterizations of 
uniquely quantum phenomena. Furthermore, this connection suggests a unifying paradigm for the resources of quantum 
information: the non-locality of quantum theory is a particular kind of contextuality, and non-locality is already known 
to be a critical resource for achieving advantages with quantum communication. In addition to clarifying these funda- 
mental issues, this work advances the resource framework for quantum computation, which has a number of practical 
applications, such as characterizing the efficiency and trade-offs between distinct theoretical and experimental schemes 
for achieving robust quantum computation, and putting bounds on the overhead cost for the classical simulation of quan- 


tum algorithms. 


Quantum information provides unique new capabilities for computation 
such as Shor’s factoring algorithm’ and quantum simulation algorithms’. 
This naturally raises the fundamental question: what unique resources 
of the quantum world enable the advantages of quantum information? 
There have been many attempts to answer this question, with proposals 
including the hypothetical ‘quantum parallelism’ some associate with 
quantum superposition, the necessity of large amounts of entanglement’, 
and much ado about quantum discord°. Unfortunately none of these 
proposals have proven satisfactory*”°, and, in particular, none have helped 
resolve outstanding challenges confronting the field. For example, on 
the theoretical side, the most general classes of problems for which quan- 
tum algorithms might offer an exponential speed-up over classical algo- 
rithms are poorly understood. On the experimental side, there remain 
significant challenges to the design of robust, large-scale quantum com- 
puters, and an important open problem is to determine the minimal 
physical requirements ofa useful quantum computer'®”’. A framework 
identifying relevant resources for quantum computation should help 
clarify these issues—for example, by identifying new simulation schemes 
for classes of quantum algorithms and by clarifying the trade-offs between 
the distinct physical requirements for achieving robust quantum com- 
putation. Here we establish that quantum contextuality, a generalization 
of non-locality identified’*”’ almost 50 years ago, is a critical resource for 
quantum speed-up within the leading model for fault-tolerant quantum 
computation, known as magic state distillation (MSD)'*"*. 
Contextuality was first recognized as an intrinsic feature of quantum 
theory via the Bell-Kochen-Specker ‘no-go’ theorem. This theorem 
implies the impossibility of explaining the statistical predictions of quan- 
tum theory in a natural way. In particular, the actual outcome observed 
under a quantum measurement cannot be understood as simply reveal- 
ing a pre-existing value of some underlying ‘hidden variable’. A key 
observation is that the non-locality of quantum theory is a special case 
of contextuality. Under the locality restrictions motivating quantum 
communication, non-locality is a quantifiable cost for classical simula- 
tion complexity'® and a fundamental resource for practical applications 


such as device-independent quantum key distribution’? ". Locality restric- 
tions can be made relevant to measurement-based quantum computation”, 
for which non-locality quantifies the resources required to evaluate non- 
linear functions**”’. However, locality restrictions are not relevant in 
the standard quantum circuit model for quantum computation, and, 
in this context, a large amount of entanglement has been shown to 
be neither necessary nor sufficient for an exponential computational 
speed-up’. 

Here we consider the framework of fault-tolerant stabilizer quantum 
computation which provides the most promising route to achieving 
robust universal quantum computation thanks to the discovery of high- 
threshold codes in two-dimensional geometries**’. In this framework, 
only a subset of quantum operations—namely, stabilizer operations— 
can be achieved via a fault-tolerant encoding. These operations define 
a closed subtheory of quantum theory, the stabilizer subtheory, which 
is not universal and in fact admits an efficient classical simulation”. 
The stabilizer subtheory can be promoted to universal quantum com- 
putation through MSD**""* which relies on a large number of ancillary 
resource states. Here we show that quantum contextuality plays a cri- 
tical role in characterizing the suitability of quantum states for MSD. 
Our approach builds on recent work*’”” that has established a remark- 
able connection between contextuality and graph-theory. We use the 
framework of refs 31 and 32 to identify non-contextuality inequalities 
such that the onset of state-dependent contextuality, using stabilizer 
measurements, coincides exactly with the possibility of universal quantum 
computing via MSD. The scope of our results differs depending on whether 
we consider a model of computation using qubits (systems of even prime 
dimension) or qudits (systems of odd prime dimension). We note that 
some authors use the term qudit to describe a system with an arbitrary 
number of levels. Whereas in both cases we can prove that violating a non- 
contextuality inequality is necessary for quantum-computational speed- 
up via MSD, in the qudit case we are able to prove that a state violates a 
non-contextuality inequality ifand only ifit lies outside the known bound- 
ary for MSD. 
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ARTICLE 


Graph-based contextuality 

Interpreting measurements on a quantum state as merely revealing a 
pre-existing property of the system leads to disagreement with the pre- 
dictions of quantum theory. In quantum mechanics, a projective mea- 
surement can be decomposed as a set of binary tests. Contradictions with 
models using pre-existing value assignments can arise when these tests 
appear in multiple measurement scenarios—that is, in multiple contexts. 
In other words, we cannot always assign a definite value to tests appear- 
ing in multiple contexts and consequently quantum mechanics cannot 
be described by a non-contextual hidden variable (NCHV) theory. The 
earliest demonstrations of quantum contextuality used sets of tests such 
that no NCHV model could reproduce the quantum predictions, regard- 
less of what quantum state was actually measured. Recently, a more general 
framework has been derived in which a given set of tests can be con- 
sidered to have non-contextual value assignments only if the measured 
state satisfies a non-contextuality inequality’'. We briefly review this 
framework below. 

Consider a set of n binary tests, which can be represented in quantum 
mechanics by a set ofn rank-1 projectors {Z/,, ..., I7,,}. Two such tests are 
compatible, and so can be simultaneously performed on a quantum sys- 
tem, if and only if the projectors are orthogonal. We define the witness 
operator 2 for a set of tests to be 


and the associated exclusivity graph Ito be a graph wherein each vertex 
corresponds to a projector and two vertices are adjacent (connected) if 
the corresponding projectors are compatible. Only one outcome can 
occur when a measurement of a set of orthogonal projectors is per- 
formed, so we require that a value of 1 will be assigned to at most one 
projector in each measurement. Since two vertices of I’ are adjacent if 
and only if the corresponding projectors are compatible, the maximum 
value of Y in an NCHV model, ()NCHY, is the independence number 
a(I), that is, the size of the largest set of vertices of I” such that no two 
elements of the set are adjacent. 

The maximum quantum mechanical (QM) value of X can be obtained 
by varying over projectors satisfying the appropriate compatibility rela- 
tions and over quantum states. This quantity is bound above by the 
Lovasz number, 3, of the exclusivity graph, that is, 


(2) imax < (2) (2) 


where 3 can be calculated as the solution to a semidefinite program. 
Graphs for which «(I”) < 9(’) indicate that appropriately chosen pro- 
jectors {J/;} and states p may reveal quantum contextuality by violating 
the non-contextuality inequality: 


Tr(X'p) <a(L) (3) 
For generalized probabilistic theories, an important class of ‘post-quantum’ 
theories, the maximum value of » is given by the fractional packing 
number of the exclusivity graph «*(/), that is: 

(Z)mnax = 2°(D) (4) 


max 


Note that if «(I”) < (2)2M =a*(L), then the optimal choice of quan- 
tum state and projectors is maximally contextual, in that no greater viola- 
tion of the non-contextuality inequality can be obtained in any generalized 


probabilistic theory. 


The stabilizer formalism 


Quantum information theory relies heavily on a family of finite groups 
usually called the (generalized) Pauli groups. The most promising and 
well understood quantum error correcting codes—stabilizer codes—are 
built using the elements of these groups, that is, Pauli operators. Qubits 
are the most commonly used building blocks for quantum computing, 
but a circuit using qudits has the same computational power. While qudits 
with larger dimensionality may pose new experimental challenges, these 
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may be offset by a lower overhead for fault-tolerant computation’. In this 
section we outline the mathematical structure associated with the gen- 
eralized Pauli group and the geometrical characterization of probabilistic 
mixtures of stabilizer states. 

The stabilizer formalism for p-dimensional systems (where p is a prime 
number) is defined using the generalized X and Z operators 


X|j)=+1) Z|j)=ol)) (5) 
oni 
where « = exp (=) . The set of Weyl-Heisenberg displacement opera- 


tors is defined as 
D,= {Dxz =a **X*Z? : x, ze} (6) 


where 2 * is the multiplicative inverse of 2 in the finite field Zy= 


{0,1,...,p—1}. For p = 2, one can replace w~? ' with i in equation 
(6) to recover the familiar qubit Pauli operators. The Clifford group C,,,, 


is defined to be the normalizer of the group (DP) (that is, the group 


generated by the set of displacement operators), that is, 
Cyn = {UU(a") : u( De" jut = (pe")t (7) 


and the set of stabilizer states is the image of the computational basis 
under the Clifford group C,,,. 

The stabilizer polytope is the convex hull of the set of stabilizer states. 
For a single system, the stabilizer polytope”’ is defined by the following 
set of simultaneous inequalities 


Psrap = {p : Tr(pA%) =0, geZ*'} (8) 


where A? = — I, + er, TT} and it} is the projector onto the eigen- 
vector with eigenvalue co1' of the jth operator in the list {Do1, Dio, Di, .- 5 
Dy,p—1} (the eigenbases of these operators form a complete set of mutually 
unbiased bases). In the preceding expression [, is the p X p identity 
matrix and q is a vector of length p + 1 with entries from {0,1,...,p — 1}. 


Magic state distillation 


The stabilizer formalism of the previous section was developed in the 
search for quantum error-correcting codes, that is, codes allowing the 
robust, fault-tolerant storage and manipulation of quantum information 
stored across many subsystems’**. Surface codes’, in particular, admit 
a comparatively high fault-tolerance threshold within an experimentally 
realistic planar physical layout. Codes such as these have a finite non- 
universal set of transversal (that is, manifestly fault-tolerant) operations 
that must be supplemented with an additional resource—a supply of 
so-called magic states—in order to attain universality. MSD refers to the 
subroutine, described below, wherein almost pure resource states are con- 
structed using large numbers of impure resource states'*"°. 

An MSD protocol consists of the following steps: (1) prepare n copies 
of a suitable (see below) input state, that is, pe "; (2) perform a Clifford 
operation on p®"; (3) perform a stabilizer measurement on all but the 
first m registers, postselecting on a desired outcome. With appropriate 
choices of stabilizer operations, the resulting output state in the first m 
registers, pom, is purified in the direction of a magic state |v), so that 
(V| Poutl¥) > (v| in| v). This process can be reiterated until pou; is suffi- 
ciently pure, at which point the resource Pou is used up to approximate a 
non-Clifford operation (via ‘state injection’)—for example, the 7/8 gate 
or its qudit generalizations'**. Supplementing stabilizer operations with 
the ability to perform such gates enables fault-tolerant and universal 
quantum computation. 

For which states pi, does there exist an MSD routine purifying Pout 
towards a non-stabilizer state? A large subset of quantum states have been 
ruled out by virtue of the fact that efficient classical simulation schemes 
are known for noiseless stabilizer circuits supplemented by access to an 
arbitrary number of states from the polytope p;,€Psim (refs 30, 35, 36). 
This polytope Psi of the known simulable states is described by**”” 
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Figure 1 | A two-dimensional slice through qutrit state space. Three distinct 
regions in the space of 3 X 3 matrix operators: the region shaded in pale green 
describes quantum state space (valid density operators); region Psm, with 
hatched shading, corresponds to ancillas known to be efficiently simulable (and 
hence useless for quantum computation via MSD); and the dark red region 
Porap describes mixtures of stabilizer states. The strict inclusion Pspag < Psim 
identifies a large class of bound magic states*. 

a {p : Tr(pA‘) >0, reZ}} p=2, (9) 

SIM = 
{p: Tr(pA*+*) >0, x, zeZp}  p>2 


where a = [1, 0, 1, ..., p — 1] and b = —[0, 1, 1, ..., 1] (ref. 38). Note that 
Psmm = Psras for qubits (giving an octahedron inscribed within the Bloch 
sphere) whereas Ps > Psraz is a proper superset for all other primes. 
Subsequently we refer to the set of facets enclosing Psim as 


Asm = {A"|p=2 : reZ},p#2 : r=xatzb} (10) 


In Fig. 1 we plot the geometric relationship between arbitrary quantum 
states and sets of states contained within Pst and Psrap for the case of 
qutrits (p = 3). 

By prior results***’, we know that the set of states Psy coincides exactly 
with the set of states that are non-negatively represented within a distin- 
guished quasiprobability representation—a discrete Wigner function*~. 
Are the states in the set Psrm, the set excluded from MSD by the known 
efficient simulation schemes, the complete set of non-distillable states? 
We now address this fundamental question by demonstrating a re- 
markable relationship between non-distillability, non-negativity and 
non-contextuality. 


35,39 


Contextuality as a computational resource 

We will prove that all states p ¢P sry exhibit state-dependent contextual- 
ity with respect to stabilizer measurements. Our definition of stabilizer 
measurement is quite inclusive; we allow all projective measurements 
wherein elements are rank-1 projectors onto stabilizer states. Rearrang- 
ing the definition of A* given in equation (8) gives 


pti 


dL Aj =pl—a" 


FY 
I=) seZy 


(11) 


Sj eal tr j 
that is, the set of projectors {a H rs } is a set of projectors whose sum, 2’ % 
is such that: 
Tr(2"p) >p=Tr(A"p) <0 (12) 


The left-hand side of this equivalence is a witness for contextuality ifand 
only if the independence number of the associated graph [™ satisfies 
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a(1™) = p as in equation (3). In fact, this simple construction fails to 
identify any quantum states as contextual because Tr(2'p) = a(/™) for 
all p. This is not surprising, given that every single-qudit stabilizer pro- 
jector is part of exactly one context, namely the basis (one of the com- 
plete set of mutually unbiased bases) in which it is contained. 

Stabilizer projectors appear in multiple contexts only when two or 
more subsystems are involved. Consequently, we introduce two-qudit 
stabilizer projectors such that the structure of A* as in equation (11) is 
reflected on the first qudit only. We can limit consideration to two-qudit 
projectors since this approach characterizes as contextual all single-qudit 
states that do not have an NCHV model via the discrete Wigner func- 
tion, that is, we find two-qudit projectors are sufficient to achieve the best 
possible result. 

Our construction uses a different set of projectors for each facet A*. For 
a fixed facet A*, we define a set of separable projectors 

(IT}1,= eer @|k) (kl: 1<j<pt Ls keZ, } (13) 

that is, we take the p(p” — 1) separable projectors consisting of all tensor 
products of projectors in equation (11) for the first qudit and computa- 
tional basis states for the second qudit. We also define the set {J7} ent to be 
the set of all two-qudit entangled projectors. 

The sum of the combined set of separable and entangled projectors 


CT} = {ID sep SD ent is 
2" =(p'lp—A") Gl, 
so that for any state c€H, of the second system (even the maximally 


mixed state) we have: 


Tr[2"(p@a)] <p*<>Tr[A"p] >0 


(14) 


(15) 


Forming the exclusivity graph I™ of {/7}* and applying the results of refs 
31 and 32 identifies the left-hand side of equation (15) as a witness for the 
contextuality of p. The following theorem shows that the inequality on the 
left-hand side of equation (15) is indeed a non-contextuality inequality. 
Theorem 1. The independence number of the exclusivity graph assoc- 
iated with 3” is o(I"") = p’ for all ATe Agqys and all prime p = 2. Further- 
more, for p > 2, a state exhibits contextuality ifand only if it violates one 
of our non-contextuality inequalities and maximally contextual states 
saturate the bound on contextuality associated with post-quantum gen- 
eralized probabilistic theories, that is, 


(2"\mae | = 9) =a") =p? +1 


(p>2) 


Theorem 1 says that, relative to our construction, exactly the states p ¢P sim 
are those that exhibit contextuality. For qudits of odd prime dimension 
there does not exist any construction using stabilizer measurements that 
characterizes any p€P sim as contextual, so that the conditions for con- 
textuality and the possibility of quantum speed-up via MSD coincide 
exactly. 

Proof. For p = 2, a software package** can be used to obtain: 


a( I) =8 <9(I") <a* (I) <9 


(16) 


The exclusivity graph /™ and an independent set of 8 vertices is depicted 
in Fig. 2. The maximal violation of our non-contextuality inequality is 
achieved by the state |T)(T| ® o, where |T) is the magic state introduced 
in ref. 14 and o is arbitrary. 

For p > 2, we will now show that 9(I™) =«* (I) =p* + 1.To do this, 
we use the graph theoretical inequality 


a) < (2) irae < H(P) <a (P) <x(L) (17) 


where 7(J’)eEN is the clique cover number, which is the minimum 
number of cliques needed to cover every vertex of I. (A clique is a 
subset of a graph’s vertices wherein every pair of vertices is connected.) 
The clique cover number cannot be greater than the number of distinct 
bases in {/7}", which contains p + 1separable bases and Pp — pentangled 
bases. Therefore 9(I") < «*(I”) <p* + 1. Then, since there exist” quantum 
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Figure 2 | Our construction applied to two qubits. Each of the 30 vertices in 
this graph I corresponds to a two-qubit stabilizer state; connected vertices 
correspond to orthogonal states. A maximum independent set (representing 
mutually non-orthogonal states) of size «(J) = 8 is highlighted in red. As 
described in Theorem 1 (main text), this value of « identifies all states p¢P sim 
as exhibiting contextuality with respect to the stabilizer measurements in our 
construction. 


M. 
max 


states p such that Tr(A‘p) = —1, (3") 
so 91) =o0* I") =p? +1. 

The statement that no non-contextuality inequality constructed from 
stabilizer measurements can be violated by any state p€Psim for odd 
prime dimensions follows from the existence ofa NCHV model, namely, 
the discrete Wigner function’*“°”’, for all stabilizer measurements and 
states p€P sm. 

We defer the proof that «(/") = p® to Supplementary Information. 


> P + 1 by equation (15) and 


Significance and outlook 

For qudits (which we defined as systems of odd prime dimension), a state 
is non-contextual under the available set of measurements—stabilizer 
measurements—if and only if it lies in the polytope Psim (the set of 
ancilla states known to be useless for any MSD routine). The same con- 
struction applied to qubits also identifies all o ¢P si as contextual. These 
results establish that contextuality is a necessary resource for universal 
quantum computation via MSD. 

For qudits, the set of states proven to be contextual by our construc- 
tion have been previously conjectured to be sufficient to promote 
stabilizer circuits to universality. Proving this conjecture would require 
proving that any state p¢Psim can be distilled to a magic state. While 
substantial progress in this direction has been made’*, it is still an open 
problem. 

For qubits, the mere presence of contextuality cannot be sufficient to 
promote stabilizer circuits to universality since any state p€P si (which 
includes the maximally mixed state) can violate a non-contextuality in- 
equality constructed from stabilizer measurements. For example, con- 
verting the Peres-Mermin magic square**** to a 24-ray (projector) proof 
of contextuality and applying the formalism of refs 31 and 32 gives a non- 
contextuality inequality that is violated by all two-qubit states, including 
states of the form p@o=1/4. 

The crucial difference between qubits and qudits is that state-independent 
contextuality (like that of the Peres-Mermin square) is never manifested 
within the qudit stabilizer formalism. Consequently, for qudits, any con- 
textuality is necessarily state-dependent and our results show that this 
contextuality has an operational meaning as a necessary and possibly 
sufficient resource for the ‘magic’ that makes quantum computers work. 
In the case of qubits, it is a pressing open question whether a suitable 
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operationally motivated refinement’ or quantification of contextual- 


ity can align more precisely with the potential to provide a quantum 
speed-up. 


METHODS SUMMARY 


Here we outline the argument that we use to prove that, for odd-prime p, the inde- 
pendence number of J* is p®. Recall that the independence number of a graph I” 
is the size of the largest independent set of J’, where an independent set is a set of 
vertices of which no two are connected. Since two vertices are connected if and only if 
the associated projectors commute, an independent set in J“ is equivalent to a set of 
mutually non-commuting projectors in {/7}*. Because the elements of {Z7}" are all 
rank 1, two elements are non-commuting if and only if they are non-orthogonal. 
We prove a(J*) = p® by proving «(I*) = p* and x(I*) < p* + 1. This completes 
the proof since «(J*) is an integer. In Theorem 2 we show that «(J*) = p by show- 
ing that there exists a set of p> mutually non-orthogonal elements of {/7}" for any 
A‘EAsim. In Lemmas 3-5 we parametrize the set of stabilizer projectors using the 
symplectic representation of the Clifford group in order to transform a condition of 
mutual non-orthogonality of projectors into a set of algebraic constraints on para- 
meters. In Theorem 6 we then show that «(I*) < p® + 1 by showing that no subset of 
Pp + 1 elements of {/7}* can satisfy the constraints established in Lemmas 3-5, thatis, 
there cannot exist a subset of p? + 1 mutually non-orthogonal elements of {/7}". 
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The genome of Eucalyptus grandis 
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Eucalypts are the world’s most widely planted hardwood trees. Their outstanding diversity, adaptability and growth have 
made them a global renewable resource of fibre and energy. We sequenced and assembled >94% of the 640-megabase 
genome of Eucalyptus grandis. Of 36,376 predicted protein-coding genes, 34° occur in tandem duplications, the largest 
proportion thus far in plant genomes. Eucalyptus also shows the highest diversity of genes for specialized metabolites such as 
terpenes that act as chemical defence and provide unique pharmaceutical oils. Genome sequencing of the E. grandis sister 
species E. globulus and a set of inbred E. grandis tree genomes reveals dynamic genome evolution and hotspots of in- 
breeding depression. The E. grandis genome is the first reference for the eudicot order Myrtales and is placed here sister to 
the eurosids. This resource expands our understanding of the unique biology of large woody perennials and provides a 
powerful tool to accelerate comparative biology, breeding and biotechnology. 


A major opportunity for a sustainable energy and biomaterials economy 
in many parts of the world lies in a better understanding of the molecular 
basis of superior growth and adaptation in woody plants. Part of this 
opportunity involves species of Eucalyptus L’Hér, a genus of woody 
perennials native to Australia’. The remarkable adaptability of eucalypts 
coupled with their fast growth and superior wood properties has driven 
their rapid adoption for plantation forestry in more than 100 countries 
across six continents (>20 million ha)’, making eucalypts the most widely 
planted hardwood forest trees in the world. The subtropical E. grandis and 
the temperate E. globulus stand out as targets of breeding programmes 
worldwide. Planted eucalypts provide key renewable resources for the 
production of pulp, paper, biomaterials and bioenergy, while mitigating 
human pressures on native forests’. Eucalypts also have a large diversity 


and high concentration of essential oils (mixtures of mono- and sesqui- 
terpenes), many of which have ecological functions as well as medicinal 
and industrial uses. Predominantly outcrossers' with hermaphroditic 
animal-pollinated flowers, eucalypts are highly heterozygous and display 
pre- and postzygotic barriers to selfing to reduce inbreeding depression 
for fitness and survival*. 

To mitigate the challenge of assembling a highly heterozygous gen- 
ome, we sequenced the genome of “BRASUZ1I’, a 17-year-old E. grandis 
genotype derived from one generation of selfing. The availability of anno- 
tated forest tree genomes from two separately evolving rosid lineages, 
Eucalyptus (order Myrtales) and Populus (order Malpighiales°), in com- 
bination with genomes from domesticated woody plants (for example, 
Vitis, Prunus, Citrus), provides a comparative foundation for addressing 
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fundamental evolutionary questions related to the biology of woody 
perennials. Moreover, the unique palaeogeographic evolution of Eucaly- 
ptus, that is, isolation from other members of the rosid clade, enables 
disentangling of the events that led to the modern members of the rosids 
by characterizing shared and unique whole-genome duplication events 
and syntenic gene space with other sequenced genomes. The draft gen- 
ome of E. grandis suggests that the Eucalyptus genome has been shaped 
by an early lineage-specific genome duplication event and a subsequent 
high rate of tandem gene duplication. 


Sequencing, assembly and annotation 


We assembled a non-redundant chromosome-scale reference (V1.0) se- 
quence for BRASUZ1 based on 6.7X whole-genome Sanger shotgun 
coverage, paired bacterial artificial chromosome (BAC)-end sequencing 
and a high-density genetic linkage map® (see Methods and Supplemen- 
tary Information section 1). An estimated 94% of the genome is orga- 
nized into 11 pseudomolecules (605 megabases (Mb), Fig. 1). Anchoring 
the genome assembly to an independent linkage map’ revealed that the 
remaining 4,941 smaller unanchored scaffolds (totalling 85 Mb) corre- 
spond largely to repeat-rich sequences and segments of alternative hap- 
lotypes of the assembled chromosomes derived from regions of residual 
heterozygosity in the otherwise inbred BRASUZ1 genome. 

The E. grandis genome encodes a large number of predicted protein- 
coding loci (36,376) of which 89% are expressed in vegetative and repro- 
ductive tissues (Extended Data Fig. 1) plus various classes of non-coding 
genes (Supplementary Information section 2). Of the 36,376 predicted 
proteins, 30,341 (84%) are included in gene clusters shared with other 
rosid lineages (Extended Data Fig. 2). Retrotransposons account for the 
major portion of the genome (44.5%), with long terminal repeat retro- 
transposons being the most pervasive class (21.9%). DNA transposons 
encompass only 5.6% of the genome. For this class, Helitron elements 
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Figure 1 | Eucalyptus grandis genome overview. 
Genome features in 1-Mb intervals across the 11 
chromosomes. Units on the circumference show 
megabase values and chromosomes. a, Gene 
density (number per Mb, range 6-131). b, Repeat 
coverage (22-88% per Mb). c, Average expression 
state (fragments per kilobase of exon per million 
sequences mapped, FPKM, per gene per Mb, 6-41 
per Mb). d, Heterozygosity in inbred siblings 
(proportion of 28 S offspring heterozygous at 
position, 0.39-0.93). e, Telomeric repeats. 

f, Tandem duplication density (2-50). g, h, Single 
nucleotide polymorphisms (SNPs) identified by 
resequencing BRASUZ] in 1-Mb bins (g) and per 
gene (h, 11,656 genes); homozygous regions 
(~24%) and genes in green and heterozygous 
regions and genes in purple. Central blue lines 
connect gene pairs from the most recent whole- 
genome duplication event (Supplementary Data 1). 


are the most abundant with an estimated 15,000 copies or 3.8% of the 
genome (Supplementary Information section 2). 


Genome evolution and phylogeny 

To address the phylogenetic position of Eucalyptus, we performed genome- 
wide analysis of 17 sequenced plant genomes, generating a matrix of 
697,423 aligned amino acid positions from 3,268 orthologue gene clusters 
(Methods and Supplementary Information section 3). Studies employing 
broad taxon sampling but a modest number of genes* have consistently 
recovered two very well-supported clades of eurosids—the fabids and 
malvids—and grouped Eucalyptus and other Myrtales with the malvids. 
Our analysis alternatively places Eucalyptus as a sister taxon to the euro- 
sids (Extended Data Fig. 3) and supports the grouping of Populus and 
Jatropha (order Malpighiales) with malvids rather than fabids, in agree- 
ment with other recent whole-genome studies””®. The discrepancy between 
our genome-wide analyses and the angiosperm phylogeny group (APG) 
consensus highlights important methodological trade-offs between sam- 
pling more characters (as in our genome-wide study) versus more taxa (as 
per APG)!?, 

The evolutionary history of the Eucalyptus genome is marked by a 
lineage-specific palaeotetraploidy event newly revealed by our genomic 
analysis, superimposed on the earlier palaeohexaploidy event shared by 
all eudicots (Fig. 2). The whole-genome duplication (WGD) is estimated 
to have occurred ~ 109.9 (105.9-113.9) million years (Myr) ago (Supple- 
mentary Information section 3 and Extended Data Fig. 4) in a Gond- 
wanan ancestor around the time when Australia and Antarctica began to 
separate from East Gondwana. This WGD event is considerably older 
than those typically detected in other rosids’* and could have played a 
pivotal part in the evolution of the Myrtales lineage and its subsequent 
diversification from other rosid ancestors. The coincidence of the esti- 
mated WGD timing and the origin of the Myrtales™ leads to speculation 
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that the WGD event could be directly related to the origin of the clade. 
More precise timing will require genomic analysis of other families and 
genera from the Myrtales. 

The Eucalyptus genome exhibits substantial conservation of synteny 
with other rosids as has been demonstrated for the basal rosid lineage 
represented by Vitis vinifera’*. Extending the method previously described” 
we identified 480 pairwise segments of conserved synteny between Eucaly- 
ptus and Vitis (Supplementary Information section 3). These segments 
include 68% of Eucalyptus genes and 76% of Vitis genes used in the 
analysis. The WGD in the Eucalyptus lineage relative to Vitis is clearly 
revealed by the 2:1 pattern in which two different Eucalyptus regions are 
typically collinear with a single region in Vitis. However, the gene con- 
tent of these segments varies, as more than 95% of the paralogues in 
Eucalyptus have been lost subsequent to the WGD (a total of 5,896 Vitis 
genes have 6,158 synteny-confirmed orthologues in Eucalyptus). Half of 
the total length of the orthologous segments is contributed by segments 
longer than 1.83 Mb in Eucalyptus and 2.35 Mb in Vitis, suggesting that 
the loss of redundant genes after the WGD in Eucalyptus was accom- 
panied by a compaction of those parts of the genome. 

Eucalyptus chromosome 3, the largest single chromosome in the 
Eucalyptus genome, is the only chromosome that does not contain 
inter-chromosomal segmental duplications (Fig. 1), having fused with 
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its WGD homologue. A similar situation occurs in Populus chromosome 
XVIII. Interestingly, Eucalyptus chromosome 3 and Populus chromo- 
some XVIII nearly exclusively contain the ancestral eudicot chromo- 
some 2 (Fig. 2c), despite their independent WGDs. There are no other 
examples among the currently sequenced dicotyledon genomes that 
contain a sole single copy of an ancestral chromosome. Moreover, in 
Eucalyptus and Populus, all other ancestral chromosomes appear to be 
dispersed and rearranged among the extant chromosomes (Fig. 2c). The 
conserved gene content and order (Supplementary Information section 3) 
on these chromosomes in two distantly related species could be due to: 
(1) convergent selection and positional stoichiometry of genes related to 
long-lived perennial woody habit that favours preservation of certain genes 
in syntenic order; and/or (2) merged ancestral chromosome structure (that 
is, multiple telomeres and centromeres on one chromosome) that sup- 
presses gene expression, recombination and/or successive rearrangement. 
Eucalyptus chromosome 3 has the lowest average gene expression metrics 
of any of the Eucalyptus chromosomes (Fig. 1c), favouring the second 
hypothesis. Alternatively, there are several clusters of shared syntenic 
genes that appear to be related to perennial habit, including homologues 
of NAM (no apical meristem, PF02365) and senescence-associated pro- 
tein (PF02365), several syntenic sets of disease-resistance genes, as well 
as genes related to cell-wall formation (Supplementary Data 2). 
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Table 1 | Tandem duplicate statistics for selected plant genomes 


Species 


Total number of retained 
tandem genes (%) 


Number of tandem 
expanded regions 


Physcomitrella patens 885 1,949 (6%) 

Arabidopsis thaliana 1,821 5,038 (18%) 
Populus trichocarpa 2,575 8,104 (18%) 
Vitis vinifera 1,818 6,033 (23%) 
Eucalyptus grandis 3,185 12,570 (34%) 


Eucalyptus has more tandem duplicates and more tandem expanded regions (clusters) than other plant 
genomes. 


Wealso find that E. grandis has the largest number of genes in tandem 
repeats (12,570, 34% of the total) reported among sequenced plant gen- 
omes (Table 1 and Supplementary Information section 3). The low fre- 
quency of contig breaks separating tandem gene pairs (Extended Data 
Fig. 5) and conserved gene order on independent BAC clones spanning 
two large tandem gene arrays (Supplementary Data 3 and Supplementary 
Information 1) support the accuracy of the assembly across highly similar 
tandem copies. Tandem duplication often involves stress-response genes 
that are retained in a lineage-specific fashion, suggesting that tandem 
duplication is important for adaptive evolution in dynamically changing 
environments’®. For example, more than 80% of the S-domain receptor- 
like kinase (SDRLK) subfamily occurs in tandem arrays (Supplementary 
Data 4). There also seems to be a bias in gene retention following tandem 
duplication in comparison to segmental and whole-genome duplication”. 
Even within the genus Eucalyptus, tandem duplication appears to be 
dynamic, for example, a cluster of MYB transcription factor genes in 
E. globulus lacks four of the nine tandem duplicates found in E. grandis 
(Extended Data Fig. 6). 

Despite having the same number of chromosomes (n = 11) and highly 
co-linear genomes", eucalypts vary considerably in genome size. E. 
grandis (640 Mb”) and E. globulus (530 Mb’*) represent different sections 
(Latoangulatae and Maidenaria) within the subgenus Symphyomyrtus”, 
estimated to have diverged in the past 36 million years’. Resequencing of 
the subtropical E. grandis (BRASUZ1) and a representative of the tem- 
perate E. globulus (‘X46’, Supplementary Information section 3) revealed 
that many small, non-transposable element (TE)-derived changes dis- 
tributed throughout the genome (164,813 regions; mean length 538 bp, 
median 230 bp, maximum 30,610 bp, total 88.7 Mb) account for nearly 
all of the genome size difference between the two species. Recent TE ac- 
tivity accounts for only 2 Mb of the size difference. This is in contrast to 
other studies in closely related plant species that report a predominant 
role for TEs in genome size evolution”. Using sequence data from 
other Eucalyptus species taxonomically positioned around the E. grandis— 
E. globulus split (J. Tibbits, unpublished data), we estimate that since 
divergence, E. grandis has gained 58 Mb and lost 12 Mb, while E. glo- 
bulus has gained 15 Mb and lost 24 Mb, suggesting more active genome 
size evolution than was apparent from previous estimates. 


Genetic load and heterozygosity 


Eucalypts are preferentially outcrossing with late-acting post-zygotic self- 
incompatibility resulting in outcrossing rates that can exceed 90%’, 
high levels of nucleotide variation”’* and accumulation of genetic load 
and expression of inbreeding depression*. A microsatellite survey of 
BRASUZ1 and its inbred siblings indicated putative hotspots of genetic 
load (Supplementary Information section 4). To investigate the distri- 
bution of preserved heterozygosity further, we resequenced an unrelated 
(outbred) E. grandis parental genotype M35D2 and 28 ofits S, offspring. 
The offspring were genotyped using 308,784 high-confidence hetero- 
zygous sites (within 22,619 genes) identified in M35D2 (Methods and 
Supplementary Information section 4). Contrary to Mendelian expecta- 
tion of 50% retained heterozygosity after selfing, we observed 52% to 79% 
heterozygosity in the 28 S, offspring (average of 66%). In all chromo- 
somes except 5 and 11, heterozygosity was high (>80%) in long chromo- 
some segments with peaks at >90% on chromosomes 6, 7 and 9 (Fig. 1d). 
Despite the strong bias towards heterozygosity in these regions, a small pro- 
portion of either homozygous haplotype was always present, suggesting 
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that there are genetic backgrounds in which homozygosity of any par- 
ticular gene is not lethal. One exception is on chromosome 4, where a 
25-Mb region is completely devoid of one homozygous class across all 
surveyed genotypes (Extended Data Fig. 7 and Supplementary Informa- 
tion section 4). 

The genetic architecture of genetic load and contribution of individual 
loci to inbreeding depression are largely unknown for woody perennials 
and present a barrier to rapid domestication via recurrent inbred mating. 
Our results suggest that a model of genome-wide cumulative effects of 
many small recessive alleles affecting overall fitness and survival best 
explains the architecture of inbreeding depression in Eucalyptus. This 
result is consistent with recent genome-wide selection experiments in 
Eucalyptus showing that a multifactorial model of a few hundred small 
effects throughout the genome contribute additively to height growth”’, 
in contrast to earlier suggestions of the existence of a relatively small 
number of loci of larger effect as reported in several biparental QTL 
mapping studies”. 


Lignocellulosic biomass production 


Whereas woody growth habit (the ability to produce radial secondary 
tissues from a vascular cambium) is polyphyletic, having appeared and 
disappeared multiple times across more than 30 diverse taxa”’, second- 
ary cell wall formation itself is highly conserved across vascular plants. 
Large woody plants produce secondary cell walls on a vastly different 
scale from that of herbaceous plants. Approximately 80% of woody 
biomass comprises cellulose and hemicellulose, with the remaining bio- 
mass primarily composed of lignin**°°. A major determinant of indus- 
trial processing efficiency lies in secondary cell wall ultrastructure, which 
is dependent on interactions among these biopolymers. We identified 
putative functional homologues of genes encoding 18 enzymatic steps of 
cellulose and heteroxylan biosynthesis (Supplementary Information sec- 
tion 5). Despite the lineage-specific WGD event and the high number 
of genes in tandem duplications, relative and absolute expression levels 
(See Methods) suggest that most of the key enzymatic steps involve only 
one or two functional homologues (Fig. 3), which are highly and spe- 
cifically expressed in xylem tissue. The xylem expression pattern of genes 
involved in sucrose catabolism suggests that Eucalyptus uses both direct 
(SUSY) and indirect (INV) pathways for the production of UDP-glucose 
(Fig. 3). Notably, the two sucrose synthase 4 homologues (Eucgr.C03199 
and Eucgr.C00769) are expressed at high levels in xylem tissue and account 
for 70% and 18% of total sucrose synthase expression, respectively. These 
genes, found on chromosome 3 in Eucalyptus, are part of a syntenic set 
of genes found on Populus chromosome XVIII, indicating that these 
genes pre-date the speciation events that separate these genera. There 
are 10 multigene families encoding phenylpropanoid biosynthesis genes 
that have expanded, mostly through tandem duplication, to include 174 
genes in E. grandis (Supplementary Information section 5). Phylogenetic 
analysis and expression profiling have allowed us to define a core set of 
24 genes, as well as five novel lignification candidates, preferentially and 
highly expressed in developing xylem (Extended Data Fig. 8 and Supple- 
mentary Information section 5). These results highlight the central role 
of tandem gene duplication in shaping functional diversity in Eucalyptus 
and suggest that subfunctionalization within these expanded gene fam- 
ilies has prioritized specific genes for wood formation. 


Secondary metabolites and oils 

It is generally thought that the extremely diverse array of secondary meta- 
bolites observed within Eucalyptus defends against a comparably diverse 
array of biotic pests, pathogens and herbivores encountered across its 
natural range. Many of the defence compounds are terpenoid based, 
including the commercially valuable eucalyptus oil, which is composed 
largely of 1,8-cineole. The conjugation of terpenes with phloroglucinol 
derivatives*’, as well as the formation of monoterpene glucose esters”, 
leads to the myriad of defence compounds that vary across the genus. 
E. grandis has the largest observed number of terpene synthase genes 
among all sequenced plant genomes (mn = 113 compared to a range of 
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Figure 3 | Genes involved in cellulose and xylan biosynthesis in wood- 
forming tissues of Eucalyptus. Relative (yellow-blue scale) and absolute 
(white-red scale) expression profiles of secondary cell-wall-related genes 
implicated in cellulose and xylan biosynthesis”. Sugar and polymer 
intermediates are shown in green, while the proteins (enzymes) involved in 
each step are shown in blue. Detailed protein names, annotation and mRNA- 
seq expression data are provided in Supplementary Data 5. ST, shoot tips; YL, 
young leaves; ML, mature leaves; FL, floral buds; RT, roots; PH, phloem, 


n = 2in Physcomitrella to 83 in Vitis, Fig. 4), as well as a marked expansion 
of several phenylpropanoid gene families (Supplementary Information 
section 5). Furthermore, a subgroup of R2R3-MYB transcription factor 
genes known to be involved in the regulation of the phenylpropanoid 
pathway in Arabidopsis is expanded by tandem duplication in Eucalyptus 
to yield 16 genes with diverse expression profiles (Extended Data Fig. 9) 
possibly associated with the wide range of phenylpropanoid-derived com- 
pounds found in Eucalyptus. 


Reproductive biology 


The genus Eucalyptus is named for its unusual floral structure derived 
from the Greek eu-, well, and kaluptos, covered, which refers to the oper- 
culum that covers the floral buds before anthesis. The ability to produce 
large amounts of pollen and seed over long generation times increases 
the reproductive success of woody perennials*’ and impacts on adaptation 
and population genetics. Interestingly, the evolution and genetic control 
of the unique floral structure in Eucalyptus may be reflected in the ex- 
pansion and deletion of genes typically associated with floral structure 
(for example, the APETALA1/FRUITFUL-like clade, Supplementary 
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Information section 7). SOC1, a type II MADS-box gene that integrates 
multiple signals related to initiation of flowering, including long days, 
vernalization and pathways related to gibberellin signalling’, has been 
markedly expanded in E. grandis compared to other angiosperms (Ex- 
tended Data Fig. 10). Eucalyptus is a diverse genus of over 700 species 
distributed in a wide range of environments ranging from tropical, sub- 
tropical and temperate forests’. This environmental heterogeneity encom- 
passes extensive variation in the onset, season and intensity of flowering”. 
Because of SOCI’s diverse roles in environmental control of flowering, 
the expansion and subfunctionalization of the SOC1 subfamily may have 
contributed to the evolutionary diversification of Eucalyptus by integrat- 
ing multiple signals into flowering responses relevant to different geo- 
graphical zones. Eucalyptus may thus provide a model for the evolution 
of responses to divergent sets of flowering cues required for wide col- 
onization and speciation. 


Conclusions and future directions 


The availability of a high-quality reference represents a timely step for- 
ward in fundamental studies of adaptation across the diversity of habitats 
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Figure 4 | Interspecific phylogenetic analysis and 
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occupied by eucalypt species. The unique biology and evolutionary his- 
tory of Eucalyptus are reflected in its genome, for example, the expansion 
of terpene synthesis genes and the large number of tandem repeats, 
respectively. The coincidence of a lineage-specific WGD with the origin 
of the Myrtales reinforces the proposed role of genome duplication in 
angiosperm evolution and underscores the value of additional genome 
sequencing of families and genera in this important rosid lineage. Future 
studies of variation in functional genes will provide insights into the rela- 
tive influences of drift and selection on Eucalyptus evolution and identify 
mechanisms of speciation and adaptive divergence. Such insight will lead 
to improved understanding of the response of eucalypts to environmental 
change. Comparative analysis of the E. grandis genome with those of other 
large perennials will add crucial insights into the evolutionary innovations 
that have made eucalypts keystone species that shape biodiversity in 
diverse ecosystems. The prospect of accelerating breeding cycles for 
productivity and wood quality via genomic prediction of complex traits” 
and association genetics is enhanced by the release of the Eucalyptus 
genome. Genome-enabled derivation of an integrative data framework 
based on large-scale genotypic and phenotypic data sets will offer in- 
creasingly valuable insights into the complex connections between indi- 
vidual genomic elements and the extraordinary phenotypic variation in 
Eucalyptus. 


METHODS SUMMARY 


We used whole-genome shotgun sequencing (6.73 final sequence coverage from 
7.7 million Sanger reads) followed by assembly in Arachne v.20071016 (ref. 36) and 
high-density genetic linkage mapping to produce chromosome-scale pseudomole- 
cule sequences of the 11 nuclear chromosomes of BRASUZ1. Protein-coding loci 
were identified using homology-based FgenesH and GenomeScan predictions and 
~260,000 PASA”’ EST assemblies from E. grandis and sister species. Gene family 
clustering was performed with the Inparanoid algorithm**”’ and peptide sequences 
analysed with Interproscan”’, SignalP“’, Predotar’, TMHMM* and orthology-based 
projections from Arabidopsis. We performed maximum-likelihood-based phylogen- 
etic reconstruction of the green plant phylogeny based on 174,020 peptides encoded 
by single copy orthologous genes from 17 plant genomes. Protein domains and domain 
arrangements were analysed to identify a core set of domains and arrangements 
present in rosid lineages represented by Eucalyptus, Vitis, Populus and Arabidopsis. 
Genome-wide gene expression profiling was performed using Illumina RNA-seq 
analysis of seven developing tissues from E. grandis. We identified whole-genome 
duplications using an approach***° based on paralogue- and orthologue-specific 
comparisons of the 36,376 predicted protein-coding genes and further refined the 
estimated age of the lineage-specific WGD event using a phylogenetic dating 
approach’*. Genome synteny between E. grandis and P. trichocarpa was evaluated 
using the VISTA pipeline infrastructure*”**. Genome resequencing of E. grandis 
(BRASUZ]) and sister species E. globulus (X46) genomes was performed with 
Illumina PE100 DNA sequencing. Lignin, cellulose, xylan, terpene and flowering- 
related gene families were analysed using a combination of gene annotation, phylo- 
genetic analysis and mRNA-seq expression profiling. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Whole-genome shotgun sequencing and assembly. All sequencing reads were 
collected with standard Sanger sequencing protocols on ABI 3730XL automated se- 
quencers at the Joint Genome Institute, Walnut Creek, CA. Three different sized 
libraries were used for the plasmid subclone sequencing process and paired-end 
sequencing. A total of 3,446,208 reads from the 2.6-kb sized libraries, 3,479,232 reads 
from the 6.0-kb sized libraries and 518,016 reads from a 36.2-40.6-kb library were 
sequenced. Two BAC libraries (EG_Ba, 127.5-kb insert and EG_Bb, 155.0-kb insert) 
were end sequenced to add an additional 294,912 reads for long-range linking. 

The sequence reads were assembled using a modified version of Arachne v.20071016 
(ref. 36) with parameters maxcliql = 100, correct1_passes = 0,n_haplotypes = 2 and 
BINGE_AND_PURGE = True. The resulting output was then passed through Rebuil- 
der and SquashOverlaps with parameters to merge adjacent assembled alternative 
haplotypes and subsequently run through another complete Arachne assembly pro- 
cess to finalize the assembly. This produced 6,043 scaffold sequences, with a scaffold 
L50 of 4.9 Mb and total scaffold size of 692.7 Mb. Scaffolds were screened against 
bacterial proteins, organelle sequences, GenBank nr and were removed if found to be 
a contaminant. Additional scaffolds were removed if they (1) consisted of >95% of 
base pairs that occurred as 24mers four other times in the scaffolds larger than 50 kb; 
(2) contained a majority of unanchored RNA sequences; or (3) were less than 1 kb in 
length. 

For chromosome-scale pseudomolecule construction, markers from the genetic 
map were placed using two methods. SSR-based markers were placed using three 
successive rounds of e-PCR with N = 0, N= 1 and N = 3. Markers that had sequence 
associated with them, including SNP markers, were placed with BLAT”’ and blastn”’. 
A total of 19 breaks (16 in high coverage (>6x), 3 in low coverage (=6x)) were made 
in scaffolds based on linkage group discontiguity; a subset of the broken scaffolds 
were combined using 257 joins to form the 11 pseudomolecule chromosomes. Map 
joins were denoted with 10,000 repeats of the letter N (Ns). The pseudomolecules 
contained 605.9 Mb out of 691.3 Mb (88%) of the assembled sequence. The final 
assembly contains 4,952 scaffolds with a contig L50 of 67.2 kb and a scaffold L50 of 
53.9 Mb. The completeness of the resulting assembly was estimated using 1,007,962 
ESTs from BRASUZ1. The goal of this analysis was to obtain an estimate of the com- 
pleteness of the assembly, rather than to do a comprehensive examination of gene 
space. Briefly, ESTs <300 bp were removed, along with chloroplast, mitochondrial 
or rDNA ESTs. All duplicate ESTs were placed against the genome using BLAT*!. 
The remaining ESTs were screened for alignments that had =90% identity and =85% 
EST coverage. The screened alignments indicated that 98.98% of available expressed 
gene loci were included in the 11 chromosome assemblies. 

Gene prediction. To produce the current gene set, we used the homology-based 
FgenesH and GenomeScan predictions. The best gene prediction at each locus was 
selected and integrated with EST assemblies using the PASA program”’. The gene set 
shown in the browser was generated from the input gene models at JGI. The gene 
prediction pipeline was structured as follows: peptides from diverse angiosperms 
and ~260,000 EST assemblies (from ~2.9 M filtered E. grandis ESTs and ~2.4M 
EST sequences from other closely related (‘sister’) Eucalyptus species, assembled with 
PASA) were aligned to the genome and their overlaps used to define putative protein- 
coding gene loci. The corresponding genomic regions were extended by 1 kb in each 
direction and submitted to FgenesH and GenomeScan, along with related angio- 
sperm peptides and/or ORFs from the overlapping EST assemblies. These two sets of 
predictions were integrated with expressed sequence information using PASA”’ 
against ~260,000 Eucalyptus EST assemblies. The results were filtered to remove 
genes identified as transposon-related. 

Gene family cluster and gene ontology analysis. The Inparanoid algorithm 
was used to identify orthologous and paralogous genes that arose through duplica- 
tion events. Clusters were determined using a reciprocal best pair match and then an 
algorithm for adding in-paralogues was applied. The peptide sequences used were 
from Arabidopsis lyrata, Arabidopsis thaliana, Brachypodium distachyon, Caenorha- 
bditis elegans, Chlamydomonas reinhardtii, Danio rerio, Ectocarpus siliculosus, Esche- 
richia coli, Eucalyptus grandis, Fragaria vesca, Glycine max, Homo sapiens, Jatropha 
curcas, Mus musculus, Neurospora crassa, Nostoc punctiforme, Oryza sativa, Phoenix 
dactylifera, Physcomitrella patens, Populus trichocarpa, Saccharomyces cerevisiae, Schi- 
zosaccharomyces pombe, Selaginella moellendorffii, Solanum tuberosum, Sorghum 
bicolor, Synechocystis pcc6803, Theobroma cacao, Vitis vinifera and Zea mays. The 
sequences were downloaded from Gramene*™, Phytozome (http://www.phytozome. 
net) and Ensemb! (http://www.ensembl.org). A functional annotation pipeline sim- 
ilar to the one used for strawberry genome annotation’® was used to infer Gene 
Ontology” assignments to 29,841 protein coding genes (~82%). Peptide sequences 
were analysed through an integrated approach involving Interproscan”, SignalP”’, 
Predotar’, TMHMM* and orthology-based projections from Arabidopsis. 

Green plant phylogeny. We used an integrated approach of gene orthology clustering”® 
and an automated workflow for phylogenomic analyses” to reconstruct land plant 
phylogeny of peptide sequences. A total of 174,020 peptides encoded by single-copy 
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protein coding orthologous nuclear genes from 17 plant genomes (Supplementary 
Data 7) were identified, aligned and assembled into a supermatrix resulting from 
conservative and liberal superalignments that retained 42.26% (697,423 amino 
acids) and 46.35% (764,978 amino acids) of the original 1,650,340 amino acid con- 
catenated alignment for maximum likelihood phylogenetic reconstruction“, respec- 
tively (Supplementary Data 7). These alignments come from 3268 orthologous gene 
clusters with each cluster carrying single copy genes from at least 9 (50%) species. 
Protein domain analysis. Domains and domain arrangements were compared 
within the rosids to distinguish a core set of domains and arrangements present in all 
rosids and those shared by one or more of the four rosid lineages included in the 
analysis (Supplementary Data 8). Domains occurring at twice the frequency in Eucaly- 
ptus compared to the average abundance in the rosids were defined as overrepresented. 
If several splice variants were present for one protein, we excluded all but the longest 
transcript. All proteomes were scanned for domains with the Pfam_scan utility and 
HMMER 3.0 against the Pfam-A and Pfam-B databases”. For the annotation of Pfam- 
A domains, we used the model-defined gathering threshold and query sequences were 
required to match at least 30% of the defining model**. Pfam-B domains were anno- 
tated using an e-value cutoff of 10-*. When possible, Pfam-A domains were mapped 
to clans and consecutive stretches of the same domain were collapsed into one large 
pseudo-domain”. We defined domain arrangements as ordered sets of domains for 
each protein. For the analysis of arrangements, only Pfam-A domains were used. 
Genome-wide mRNA expression profiling. To study the expression of predicted 
protein-coding and ncRNA genes, RNA-seq reads obtained from Illumina sequencing 
of seven Eucalyptus tissues (that is, shoot tips, young leaf, mature leaf, flower, roots, 
phloem and immature xylem, http://www.eucgenie.org/, Hefer et al., unpublished 
data) were mapped to the Eucalyptus genome using TopHat*' with the Bowtie 
algorithm” for performing the alignment. The aligned read files were processed 
by Cufflinks*, with RNA-seq fragment counts (that is, fragments per kilobase of 
exon per million fragments mapped (FPKM)) to measure the relative abundance 
of transcripts. Differential ncRNA expression between the seven Eucalyptus tissues 
was determined using Cuffdiff. 

ncRNA analysis. To predict ncRNAs in Eucalyptus, the genome sequence was 
scanned using Infernal® with the covariance models (that is, a combination of se- 
quence consensus and RNA secondary structure consensus) of 1,973 RNA families 
in the RFam database v10.1 (refs 64, 65). The bit score cutoff of the Infernal search 
was set as the TC cutoff value that was used by RFam curators as the trusted cutoff. 
The Infernal search result was further filtered by an e-value cutoff of 0.01. To exa- 
mine the ncRNA conservation between Eucalyptus and other plant genomes, the 
Eucalyptus ncRNA candidate sequences obtained from the Infernal search were used 
as queries to search against the genome sequences listed above using BLAT” with a 
minimum coverage (that is, minimum fraction of query that must be aligned) of 80% 
and a minimum identity of 60%. 

5’ UTR empirical curation. Approximately 2.9 million E. grandis ESTs and ~700 
million RNA-seq reads from seven diverse tissues were used to empirically curate 5 
UTR annotations. At each locus, the predicted, EST and RNA-seq derived 5’ UTR 
lengths were compared. An empirical annotation was prioritized over an in silico 
prediction and the longest empirical transcript was preferred. Those loci which had 
a 5’ UTR reported by only FGenesH retained their annotation as the best current 
annotation. 

Genome evolution. We used the E. grandis genome sequence information (http:// 
www.phytozome.net/eucalyptus.php) to unravel the Myrtales evolutionary palaeo- 
history leading to the modern Eucalyptus genome structure of 11 chromosomes. 
Independent intraspecific (that is, paralogue) and interspecific (that is, orthologue) 
comparisons were necessary to infer gene relationships between Eucalyptus and the 
other rosid genomes. We applied a robust and direct approach**“° allowing the cha- 
racterization of genome duplications by aligning the available genes (36,376) on them- 
selves with stringent alignment criteria and statistical validation. 

We used the VISTA pipeline infrastructure*”* for the construction of genome- 
wide pairwise DNA alignments between E. grandis and Populus trichocarpa. To align 
genomes we used a combination of global and local alignment methods. First, we 
obtained an alignment of large blocks of conserved synteny between the two species 
by applying Shuffle-LAGAN global chaining algorithm® to local alignments pro- 
duced by translated BLAT™’. After that we used Supermap, the fully symmetric whole- 
genome extension to the Shuffle-LAGAN. Then, in each syntenic block we applied 
Shuffle-LAGAN a second time to obtain a more fine-grained map of small-scale 
rearrangements such as inversions. 

Syntenic regions between Eucalyptus chromosome 3 and Populus chromosome 
XVIII were defined as segments of contiguous sequence. Each contiguous block of 
DNA was annotated and cross-compared between the two species. Gene models with- 
in the syntenic blocks were compared based on a sliding window representing 10 
gene models with an allowance of two intercalated gene models. Genes occurring in 
tandem repeats on either the Eucalyptus or Populus chromosomes were counted as a 
single locus in either case. The constructed genome-wide pair-wise alignments can 
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be downloaded from http://pipeline.Ibl.gov/downloads.shtml and are accessible for 
browsing and various types of analysis through Phytozome (http://phytozome.org). 
For comparative analysis of the E. grandis and E. globulus genomes, enriched nuclei 
were extracted using a modified BAC library preparation protocol®’ and DNA ex- 
tracted following Tibbits et al.°*. DNA was prepared for sequencing using Illumina 
TruSeq kits and 100-bp paired-end sequencing was performed on a HiSeq2000. 
NUCLEAR software (Gydle Inc.) was used to filter for high-quality reads that were 
then mapped to the E. grandis genome scaffold assembly. The VISION software was 
used to visualize assemblies and assembly metrics were computed using custom Perl, 
Rand Shell scripts. 
Genome function analysis. Using homology to Arabidopsis genes and Pfam domain 
analysis we identified candidate homologues for lignin, cellulose and xylan biosyn- 
thetic genes. All possible family members were identified and their gene expression 
evaluated in seven developing tissues of E. grandis using Illumina RNA-seq analysis. 
In particular, we analysed each gene’s expression relative to other family members/ 
isoforms in xylem, as well as relative to the median (~90,000 FPKM) of xylem ex- 
pression in the entire transcriptome. Considering each gene’s relative and absolute 
expression levels, all members expressed over median in xylem were noted (Supple- 
mentary Data 5 and Supplementary Data 9). Similarly, a search for conserved protein 
motifs for the terpene synthase gene family was conducted in eight plant genomes, in- 
cluding E. grandis (Supplementary Information section 6 and Supplementary Data 6). 
The amino acid sequences were aligned and truncated to compare homologous sites. 
A maximum likelihood tree was created, rooted by the split between two major types 
of terpene synthase genes, and nodes were coloured by species. 
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Extended Data Figure 1 | RNA-seq-based expression evidence for predicted 
Eucalyptus grandis gene models. Gene expression was assessed with Illumina 
RNA-seq analysis (240 million RNA sequences from six tissues, mapped to 

36,376 E. grandis genes, V1.1 annotation). Genes were counted as expressed ina 
tissue ifa minimum of FPKM = 1.0 was observed in the tissue. A total of 23,485 
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gene models (64.6%) were detected in all six tissues compared here and 32,697 
(89.9%) in at least one of the six tissues. Expression profiles for individual genes 
are accessible in the Eucalyptus Genome Integrative Explorer (EucGenIE, 
http://www.eucgenie.org/). 
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a Gene Family Clusters Across Tree of Life 
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Extended Data Figure 2 | Sharing of protein-coding gene families, protein 
domains and domain arrangements in Eucalyptus, Arabidopsis, Populus 
and Vitis. a, The four rosid lineages have a total of 16,048 protein coding gene 
clusters (from a total of 35,118 identified in 29 sequenced genomes; see 
Methods and Supplementary Information section 3) of which a core set of 6,926 
clusters are shared among all four lineages. Of the 36,376 high-confidence 
annotated gene models in E. grandis, 30,341 (84%) are included in 10,049 


Eucalyptus 


* Number of unique PfamA domains 
* (Number of domain arrangements) 


clusters. E. grandis has 851 unique gene clusters (that is, not shared with any of 
the three other rosid genomes, but shared with at least one other of the 29 
genomes). b, A total of 3,160 Pfam A domains are shared among the four rosid 
lineages, the majority of which are single-domain arrangements (3,138 shared 
among the four lineages). Thirteen PfamA domains were only detected in 
Eucalyptus and 392 domain arrangements are specific to Eucalyptus in this 
four-way comparison. 
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Extended Data Figure 3 | Green plant phylogeny based on shared gene 
clusters from 17 sequenced plant genomes. The phylogenetic tree was 
generated by RAxML analysis including at least one protein from at least half of 
the species per protein cluster in a concatenated MUSCLE alignment adjusted 
by Gblocks with liberal settings (Supplementary Data 7). The corresponding 
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bootstrap partitions are provided at each node. The tree was rooted with 
Physcomitrella (a moss) as outgroup. The Myrtales lineage represented by 
Eucalyptus grandis is supported as sister to fabids and malvids (core rosid) 
clades together with the basal rosid lineage Vitales, whereas Populus trichocarpa 
(Malpighiales) is grouped with malvids. 
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Extended Data Figure 4 | Dating of the Eucalyptus lineage-specific whole- 
genome duplication event. a, Eucalyptus K, whole-paranome (the set of all 
duplicate genes in the genome) age distribution. On the x axis the K, is plotted 
(bin size of 0.1); on the y axis the number of retained duplicate paralogous 
gene pairs is plotted. b, Eucalyptus K, anchor age distribution. On the x axis the 
K, is plotted (bin size of 0.1); on the y axis the number of retained duplicate 
anchors is plotted. Anchors falling within the K, range of 0.81.5 were used for 
absolute dating. c, Eucalyptus absolute dated anchors from the most recent 
WGD. The smooth green curve represents the maximum likelihood normal fit 
of dated anchors derived from the most recent WGD in Eucalyptus, whereas 
the blue dots represent a histogram of the raw data. The dashed line indicates 
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the ML estimate of the distribution mode, whereas the dotted lines delimit the 
corresponding 95% confidence intervals. The mode of dated anchors is 
estimated at 109.93 Myr ago with its lower and upper 95% boundaries at 105.96 
and 113.91 Myr ago, respectively. d, Genome duplication pattern in the 

core eudicot (rosid and asterid) ancestor and lineages leading to Solanum 
(asterid), Vitis and Eucalyptus (basal rosids) and the core rosids. The three 
Eucalyptus (E1-E3), Vitis (V1-V3) and Solanum (S1-S3) orthologues were 
generated by the shared hexaploidy event (purple box, ~130 to 150 Myr ago) 
and an additional set of Eucalyptus orthologues (E1’-E3’) were created in the 
lineage-specific WGD (orange boxes, ~110 Myr ago). 
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Extended Data Figure 5 | Genome-wide analysis of tandem gene assemblies. 
The number and distribution of contig breaks was evaluated for pairs of tandem 
genes (located within 50 kb of each other). a, Distribution of the number of 
contig breaks between gene pairs (blue bars) and cumulative proportion of gene 
pairs separated by contig breaks (black line). b, Distribution of the number of 
contig breaks per separation distance showing that the number of breaks is 
positively correlated with separation distance. The red line shows the 
distribution of distance between gene pairs with three or more contig breaks. 
c, Distribution of Ks divergence of tandem gene pairs in clusters with exactly 


ARTICLE 


b 6.0E-05 ~-No Breaks 
~+1 Break 
~>2 Bri 
5.0E-05 oo 
§ 
§ 4-0E-05 
2 
2 
= 
3 3.0E-05 
g 
Ef: 
5 2.0£-05 
2 
1.0E-05 
0.0E+00 - : — , , 4 r Sm 
0 5000 10000 15000 20000 25000 30000 35 000 40 000 45 000 50000 55 000 
Distance between Gene Pair (kb) 
d 3.5 
Species |TD/Myr |Loss/Myr ° Eg 
E.grandis | 1351) 0.092| @ Pt 
_ 34 |P.trichocarpa| _452|__0.087| a VV 
é V. vinifera | 253| 0.055) » at 
3 
& 
525 4 
3 
3 
- 
i) 
nd 
B 2- 
= 
3 
<= 
s 
ey 15 4 
1 + 7 7 7 
0 0.05 0.1 0.15 


4DTV epoch 


two tandem genes showing a gradient of similarity (that is, age of duplication) 
expected for authentic tandem gene pairs. d, Rate of tandem gene duplication 
(TD) and gene loss in Eucalyptus grandis (Eg), Populus trichocarpa (Pt), 
2Vitis vinifera (Vv) and Arabidopsis thaliana (At). All of the rosid genomes 
(except Arabidopsis) exhibit constant rates of tandem duplication and loss. The 
rate of tandem gene duplication in Eucalyptus has been stable and consistently 
higher than in Populus and Vitis. 1 Myr ~ 0.0026 transversions at fourfold 
degenerate sites, consistent with Populus and Eucalyptus having diverged 
~100 Myr ago. 
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Extended Data Figure 6 | Illumina PE100 read coverage of the ~760-kb to the region showing 1 relative coverage across all nine of the tandem 
region containing a R2R3-MYB tandem gene array. Illumina PE100 reads _ duplicates (red blocks) in the region, and b, X46 (E. globulus) reads mapped to 
generated from BRASUZI (E. grandis) and X46 (E. globulus) were aligned to _ the region showing 1X relative coverage on approximately half of the region 
the E. grandis (BRASUZ1, V1.0) genome assembly, and insert (green bars) and — with some tandem duplicates apparently absent from the E. globulus genome. 
sequence (blue line) coverage investigated for the ~760-kb region includinga Note that insert coverage (green bars) is relatively higher for E. globulus (X46, 
R2R3-MYB tandem array (details in Supplementary Data 3) in the E. grandis _ panel b) due to the larger insert size of the genomic library sequenced for 
genome assembly. a, Read coverage profile of the BRASUZI1 reads mapped X46 (~300 bp) than for BRASUZ1 (~150 bp). 
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Extended Data Figure 7 | Alternative homozygous classes observed in the 28 
M35D2 siblings as a function of position on chromosomes 1-11. Several 
peaks of conserved heterozygosity (peaks >80%) are seen on all chromosomes 
except 5 and 11. A region of 25 Mb on chromosome 4 from 11 to 36 Mb is 
completely devoid of homozygous versions of one of the alleles (red line), but 
has roughly 25-32% of the siblings homozygous for the other allele (green line) 
and the rest heterozygous in a roughly 1:4 ratio. The blue line is the total 


proportion of siblings out of 28 that are heterozygous in the region. One 
would expect 50% under the null model, but almost the entire chromosome is 
biased towards heterozygosity. In several other regions (for example, 
chromosomes 6, 7, 9 and 10) both homozygous classes are depleted, suggesting 
the presence of genetic load at different loci along the two parental homologues 
and explaining the strong selection for heterozygosity in such regions. 
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Extended Data Figure 8 | Genes involved in lignin biosynthesis in woody 
tissues of Eucalyptus. Relative (yellow-blue scale) and absolute (white-red 
scale) expression profiles of secondary cell-wall-related genes implicated in 
lignin biosynthesis. Detailed gene annotation and mRNA-seq expression data 
are provided in Supplementary Data 9. Five novel Eucalyptus candidates that 
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have not previously been associated with lignification are indicated by asterisks 
(Carocha et al., unpublished data). ST, shoot tips; YL, young leaves; ML, mature 
leaves; FL, floral buds; RT, roots; PH, phloem, IX, immature xylem. Absolute 
expression level (FPKM™) is only shown for immature xylem. 


Limited. All rights reserved 


Woody Preferential Subgroup V 


1 
' 
d ST YL ML FL PH KK 
' 
1 


Euogr.A01767.1 


= 
7 oe 08 
* ees 


os 
<J 


‘Woody Preferential Subgroup Il” >. 
' ST. A A oe ‘ ba 
i Euegr.D02390.1 |) 
i) 
' 


Euegr.D01701.1 
Euegr.£01031.1 


Extended Data Figure 9 | Phylogenetic tree of R2R3 MYB sequences from 
subgroups expanded and/or preferentially found in woody species. A total 
of 133 amino acid sequences from Eucalyptus grandis (50), Vitis vinifera (34), 
Populus trichocarpa (40), Arabidopsis thaliana (6) and Oryza sativa (3) 
corresponding to three woody-expanded (subgroups 5, 6 and AtMYB5 based 
on Arabidopsis classification) and five woody-preferential subgroups 

(I through V). The latter do not contain any Arabidopsis nor Oryza sequences. 
Sequences were aligned using MAFFT with the FFT-NS-i algorithm” 
(Supplementary Data 10). Evolutionary history was inferred constructing a 
Neighbour-joining tree with 1,000 bootstrap replicates (bootstrap support is 
shown next to branches) using MEGAS (ref. 70). The evolutionary distances 


ARTICLE 


RNAseq-based Transcript Abundance (FPKM) 


Eucal) randis 
° vis vnlierd. 


- Populus trichocai 
5 aocekant lecloos 
» Oryza sativa 200000 


Euegr.C03151.1 
Euegr.C00721.1 


Euogr.C00826.1 
EuogrF03176.1 
Eucgr.C03554.1_ 
Euogr.C00724.1 | 
Eucgr.C00722.1 


were computed using the Jones-Taylor-Thornton substitution model and the 
rate variation among sites was modelled with a gamma distribution of 1. 
Positions containing gaps and missing data were not considered in the analysis. 
The tree is drawn to scale, with branch lengths in the same units as those of the 
evolutionary distances used to infer the phylogenetic tree. RNA-seq-based 
relative transcript abundance data for six different tissues, expressed in FPKM 
values (fragments per kilobase of exon per million fragments mapped), are 
shown for each Eucalyptus gene next to each subgroup. ST, shoot tips; YL, 
young leaves; ML, mature leaves; FL, flowers; PH, phloem; and IX, immature 
xylem. 
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Extended Data Figure 10 | Phylogenetic tree of type II MIKC MADS box 
proteins. Neighbour-joining consensus tree of the type II MIKC sub-clade 
using protein sequences from Eucalyptus grandis, Arabidopsis thaliana, 
Populus trichocarpa and Vitis vinifera (Supplementary Data 11). Bootstrap 
values from 1,000 replicates were used to assess the robustness of the tree. 
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Bootstrap values lower than 40% were removed from the tree. Eucalyptus genes 
are denoted with green dots, Arabidopsis genes with red dots, Populus genes 
with yellow dots and Vitis genes with blue dots. The gene model numbers from 
Populus and Vitis were abbreviated to better fit in the figure (P. trichocarpa, Pt; 
V. vinifera, Vv). 
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Single-cell RNA-seq reveals dynamic 
paracrine control of cellular variation 


Alex K. Shalek'?*, Rahul Satija**, Joe Shuga’*, John J. Trombetta*, Dave Gennert’, Diana Lu’, Peilin Chen‘, Rona S. Gertner’?, 
Jellert T. Gaublomme'”, Nir Yosef?, Schraga Schwartz’, Brian Fowler*, Suzanne Weaver‘, Jing Wang*, Xiaohui Wang’, 
Ruihua Ding'’, Raktima Raychowdhury’, Nir Friedman’, Nir Hacohen*°, Hongkun Park'**, Andrew P. May* & Aviv Regev>” 


High-throughput single-cell transcriptomics offers an unbiased approach for understanding the extent, basis and function 
of gene expression variation between seemingly identical cells. Here we sequence single-cell RNA-seq libraries prepared 
from over 1,700 primary mouse bone- marrow-derived dendritic cells spanning several experimental conditions. We 
find substantial variation between identically stimulated dendritic cells, in both the fraction of cells detectably expressing 
a given messenger RNA and the transcript’s level within expressing cells. Distinct gene modules are characterized by 
different temporal heterogeneity profiles. In particular, a ‘core’ module of antiviral genes is expressed very early by a few 
‘precocious’ cells in response to uniform stimulation with a pathogenic component, but is later activated in all cells. By 
stimulating cells individually in sealed microfluidic chambers, analysing dendritic cells from knockout mice, and 
modulating secretion and extracellular signalling, we show that this response is coordinated by interferon-mediated 
paracrine signalling from these precocious cells. Notably, preventing cell-to-cell communication also substantially 
reduces variability between cells in the expression of an early-induced ‘peaked’ inflammatory module, suggesting that 
paracrine signalling additionally represses part of the inflammatory program. Our study highlights the importance of 
cell-to-cell communication in controlling cellular heterogeneity and reveals general strategies that multicellular popu- 


lations can use to establish complex dynamic responses. 


Variation in component molecules between individual cells'’” may 
have an important role in diversifying population-level responses*’, 
but also poses therapeutic challenges**. Although pioneering studies 
have explored heterogeneity within cell populations by focusing on 
small sets of preselected markers'****”, single-cell genomics promises 
unbiased exploration of the molecular underpinnings and consequences 
of cellular variability'*""”. 

We previously’ used single-cell RNA-seq to identify substantial 
differences in messenger RNA (mRNA) transcript structure and abun- 
dance across 18 bone-marrow-derived mouse dendritic cells 4 h after stim- 
ulation with lipopolysaccharide (LPS, a component of Gram-negative 
bacteria). Many highly expressed immune response genes were distri- 
buted bimodally amongst single cells, originating, in part, from closely 
related maturity states and the variable activation of a key antiviral 
circuit. As these observations focused on a single pathogenic stimulus 
and time point, they raised several questions about the causes and roles 
of single-cell variability during the innate immune response. Examining 
the dynamics of cellular heterogeneity, its pathogen-specificity, and its 
intra- and intercellular control required new approaches to profile large 
numbers of cells from diverse conditions and genetic perturbations. 

Here we use a microfluidic device to help prepare over 1,700 SMART- 
seq’ single-cell RNA-seq libraries along time courses of bone-marrow- 
derived dendritic cells responding to different stimuli (Fig. 1 and Extended 
Data Fig. 1a). Combining computational analyses with diverse pertur- 
bations—including stimulation of individual cells in isolated, sealed 
microfluidic chambers and genetic and chemical alterations of para- 
crine signalling—we show how both antiviral and inflammatory response 


modules in dendritic cells are controlled by positive and negative inter- 
cellular paracrine signalling that both promote and restrain variation. 


Microfluidics-based single-cell RNA-seq 


We used the C, single-cell Auto Prep System (Fluidigm; Fig. 1b) anda 
transposase-based library preparation strategy to perform SMART-seq** 
(Supplementary Information) on 1,775 single dendritic cells, including 
both stimulation time courses (0, 1, 2, 4 and 6h) for three pathogenic 
components’® (LPS, PIC (viral-like double-stranded RNA), and PAM 
(synthetic mimic of bacterial lipopeptides)) and additional perturba- 
tions (Fig. 1, Extended Data Fig. 1 and Supplementary Information). 
For most conditions, we captured up to 96 cells (87 + 8 (average + s.d.)), 
and generated a matching population control (Fig. 1c, Supplementary 
Information and Supplementary Table 1). We prepared technically matched 
culture and stimulation replicates for the 2h and 4h LPS stimuli, and 
independent biological replicates for the unstimulated (0 h) and 4h LPS 
experiments (Supplementary Information). We sequenced each sample 
to an average depth of 4.5 + 3.0 million read pairs, as single-cell expres- 
sion estimates stabilized at low read-depths’*” (Extended Data Fig. 2). 
The quality of our libraries was comparable to published SMART-seq 
data’*’* (Extended Data Fig. 1b, Supplementary Tables 1 and 2). Overall, 
we successfully profiled 831 cells in our initial time courses and 944 cells 
in subsequent experiments (Extended Data Fig. 1a and Supplementary 
Tables 1 and 2). We excluded another 1,010 libraries with stringent 
quality criteria (Supplementary Information and Extended Data Fig. 1c). 

Aggregated in silico, single-cell expression measurements agreed with 
the matching population controls (R = 0.87 + 0.05), with correlations 
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Figure 1 | Microfluidic-enabled single-cell RNA-seq of dendritic cells 
stimulated with pathogenic components. a, Schematic of Toll-like 
receptor (TLR) sensing of PAM by TLR2, LPS by TLR4, and PIC by TLR3 
(Supplementary Information). b, Microfluidic capture of a single dendritic 
cell (top, cell circled in purple) on a C; chip (CAD drawing, bottom). 

c, Time-course expression profiles for induced genes (rows) in dendritic cells 


plateauing once we had sampled ~30 cells (Supplementary Informa- 
tion, Extended Data Fig. 1d-g). Technical and biological replicates were 
reproducible (technical: aggregate R > 0.90, biological: aggregate R > 
0.87; Extended Data Fig. 3) and our results were robust to variations in 
several aspects of sample preparation (Supplementary Information and 
Extended Data Fig. 1h-j). We removed 537 ‘cluster-disrupted’ dendritic 
cells’®, a distinct subpopulation that matures as an artefact of isolation 
and culturing (Supplementary Information and Extended Data Fig. 4), 
retaining 1,238 dendritic cells for further analyses (Supplementary Tables 1 
and 2). 


Variability during immune responses 

Principal components analysis (PCA) of gene expression profiles from 
all three time courses together showed that dendritic cells spread along 
a continuum of expression variation in each principal component (PC) 
(Fig. 1c and Extended Data Fig. 1k—n). For example, although PC] dis- 
tinguished early from late time points for each stimulus, its scores also 
varied substantially between cells within any single stimulus and time 
point (Fig. 1c and Extended Data Fig. 1k—n), suggesting that some cells 
were ahead of others, especially early (1-2 h). 

Consistent with previous studies'’, pathogen-responsive genes parti- 
tioned into co-regulated modules based on their population-level expres- 
sion profiles (Fig. 1c, left; Supplementary Information). Genes induced 
in cells stimulated with LPS or PIC (cluster I, Fig. 1c) were enriched for 
antiviral defence factors, including interferons and their targets (Bonferroni- 
corrected P< 10°), whereas genes induced in cells stimulated with LPS 
or PAM (cluster III, Fig. 1c) were enriched for inflammatory genes and 
NF-«B targets (Bonferroni-corrected P< 10” °; Supplementary Table 3). 

We used the single-cell gene expression profiles to partition these 
main clusters into finer modules (Fig. 1c, black lines, right; Supplemen- 
tary Table 3; Supplementary Information) and applied a resampling 
method” (Supplementary Information, Extended Data Fig. 5d) to iden- 
tify four modules significantly associated with the three major PCs (Fig. 1c): 
Cluster Ig (core antiviral module; enriched for annotated antiviral and 
interferon response genes; for example, [fit1, Irf7; Bonferroni-corrected 
P<10 *, Supplementary Table 3, Fig. 1c and Extended Data Fig. 5a) 
was significantly associated with PC]; cluster III, (peaked inflammatory 
module; showing rapid, yet transient, induction under LPS; for example, 
Tnf, Illa, Cxcl2) and cluster Iq (sustained inflammatory module; exhib- 
iting continued rise in expression under LPS; for example, Mmp14, 
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(columns) at 0, 1, 2, 4 and 6h post-stimulation with PAM (green), LPS (black), 
or PIC (magenta) within populations (left) and individual cells (right). 

Far right: gene projection scores onto the first three principal components 
(PCs) (columns); bottom: contributions of each cell (columns) to the first three 
PCs (rows). 


Marco, II6) were associated with PC2; and cluster II], (‘maturity module; 
containing markers of dendritic cell maturation; for example, Cd83, 
Ccr7, and Ccl22; Supplementary Information) was associated with PC3. 


Digital and analogue expression variability 

Genes from these four modules displayed distinct patterns of variation 
that changed with time and stimulus (Fig. 2a, Extended Data Figs 5 and 6). 
For example, early after LPS stimulation, core antiviral response genes 
were detectably expressed only in some cells (that is, were bimodal) (Fig. 2a, 
Extended Data Figs 5a and 6), but were turned on in most cells between 
2 and 4h (that is, became unimodal). In contrast, many peaked inflam- 
matory genes were induced by LPS in all cells early, but were only detec- 
table in some cells later (Fig. 2a, Extended Data Figs 5b and 6). Finally, 
sustained inflammatory genes were induced early in most cells and per- 
sisted at equal or elevated levels later (Fig. 2a, Extended Figs 5c and 6). 
Some variation patterns changed between stimuli (for example, peaked 
inflammatory genes remained detectably expressed in most cells late (6 h) 
in PAM), whereas other patterns were similar for distinct pathogens 
(for example, the antiviral modules I,-Ig under LPS and PIC) (Figs 1 
and 2a and Extended Data Fig. 5a-c). 

As noted previously from single-cell quantitative real-time polymerase 
chain reaction (qRT-PCR) data”', we distinguished two types of het- 
erogeneity: (1) digital (on/off) variation, reflecting the percentage of cells 
detectably expressing a transcript; and (2) analogue variation, repre- 
senting expression level variability among detectably expressing cells. 
Using the variance calculated over all cells as a metric of heterogeneity*'* 
conflates these two types of variation. We therefore explicitly modelled 
our data using three parameters (Fig. 2b and Extended Data Fig. 7): the 
mean (1) and variance (o*) of a gene’s expression among detectably 
expressing cells, and the fraction of detectably expressing cells ()”": in 
this scheme, o” and « signify analogue and digital variation, respectively. 

We computed « based on a fixed threshold for appreciable expres- 
sion (In(TPM + 1) >1, Supplementary Information and Extended Data 
Fig. 7a, f), and then estimated 1 and 0” across appreciably expressing 
cells. This three-parameter model effectively described most (91%) of 
our single-cell data (Fig. 2c, d, Supplementary Information and Extended 
Data Fig. 7b). Our data did not support fitting with either a single log- 
normal or a mixture of two, fully parameterized lognormals (modelling 
high and low expression states; Supplementary Information and Extended 
Data Fig. 7c-e). Computed « values were consistent between technical 
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Figure 2 | Time-dependent behaviours of single cells. a, Single-cell 
expression distributions for three genes at each time point after stimulation 
with PAM (top, green), LPS (middle, black), or PIC (bottom, magenta). 
Distributions are scaled to have the same maximum height. Individual cells are 
plotted as bars underneath each distribution. b, Three parameters describing 
single-cell gene expression distributions: 1 (green) and 6” (gold), the mean and 
variance of RNA abundance in detectably expressing cells, respectively, and 


and biological replicates, but 1 and 6” estimates were reproducible only 
when genes were expressed in at least 10 or 30 cells, respectively (Sup- 
plementary Note, Supplementary Information, Extended Data Figs 2c-e, 
7g and 8). 

Our nominal estimates are likely deflated due to the detection limits 
of single-cell RNA-seq. Indeed, we observe higher % values when exam- 
ining our existing RNA fluorescence in situ hybridization (RNA-FISH) 
data’® (Extended Data Fig. 6g-j). By comparing our single-cell RNA- 
seq and RNA-FISH, we estimate that the transcript detection efficiency 
for our single-cell RNA-seq is ~20%, consistent with previous reports’*”. 
We and others'*” have also observed a strong relationship between the 
average expression of a gene and its probability of detection (Extended 
Data Fig. 7h). We thus used a conservative null model, where this rela- 
tionship results solely from technical limitations (Supplementary Informa- 
tion, Extended Data Fig. 7h), and determined the maximum likelihood 
estimate of & (Or) for each gene after correcting for this relationship 
(Fig. 2e, Extended Data Fig. 7j-1 and Supplementary Information). 
From this analysis, we estimate that ~45% of core antiviral genes and 30% 
of peaked inflammatory genes are significantly bimodal in at least one 
measured time point in the LPS response (likelihood ratio test (LRT), 
Bonferroni-corrected P < 0.01; Supplementary Information and Ex- 
tended Data Fig. 7i). 


Chromatin mark levels correlate with a, 

As the presence ofa chromatin mark is, by definition, discrete in a single 
cell, we reasoned that population ChIP-seq profiles of active histone 
marks (for example, histone 3 lysine 27 acetylation (H3K27ac)) should 
more closely reflect the fraction of cells with detectable transcripts («) 
than population-level expression. Supporting this hypothesis, the observed 
a for a gene was strongly correlated (mean R for binned data = 0.89; 
Supplementary Information) to its promoter-associated ChIP-seq 


a. (blue), the fraction cells with detectable expression (at In(TPM + 1) >1). 

c, Examples of fit (grey) and measured Tnf expression distributions (black). 
d, The values of 1, o’,anda (y axes, left to right) computed for Tnf at each time 
point (x axis). Units for j1 and o” are In(TPM + 1). e, Maximum likelihood 
estimate & (Or). Shown are the likelihood functions (dotted blue line) for Tnf 
(matching c) used to determine %yx (red line; vertical black line: nominal «; 
Supplementary Information). 


density (collected under identical conditions”), even within a fixed pop- 
ulation expression range (Fig. 3a top/middle, rows). In contrast, a gene’s 
population-level expression was not correlated (mean R for binned data 
= —0.02) to H3K27ac promoter levels within a fixed o range (Fig. 3a 
top, middle; columns). We note that H3K27ac and population-level ex- 
pression remained correlated within a fixed range of 1 (instead of «, 
Fig. 3b). A partial correlation analysis focussed on either all immune 
response genes or ‘bimodal’ genes (LRT, P < 0.01) yielded similar results 
(P > 0.1, after controlling for «, Fig. 3c). Digital variation did not correlate 
with histone 3 lysine 4 trimethylation (H3K4me3) levels (Fig. 3a, bot- 
tom), in line with previous observations” that H3K4me3 is not as tightly 
correlated with active transcription. Emerging single-cell epigenomic 
technologies”* should help to further explore this relationship. 


Dynamic responses via shifts in o and p 
An average (population) increase in the expression of bimodally expressed 
genes may represent changes in the amount of transcript made by express- 
ing cells (shifts in 1), the proportion of expressing cells (shifts in «), or 
both. For each pair of consecutive time points, we examined the propor- 
tion of genes in each module with a significant change in: (1) p (Wilcoxon 
rank-sum test); (2) « (LRT, controlling for the aforementioned confound- 
ing relationship between average expression and detection efficiency, 
Supplementary Information); or (3) both. Given our limitations in esti- 
mating « and 1, we only considered genes that were annotated as bimodal 
in at least one time point in the relevant time course and expressed in 
at least 10 cells in both time points (Supplementary Information). We 
excluded the unstimulated time point when most immune response 
genes were not yet expressed. 

Under LPS stimulation, core antiviral and sustained inflammatory 
genes had the strongest increases in o (alone or with J) at early time 
points (Fig. 3d, top; Extended Data Fig. 5e, f), and transitioned to high 
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Figure 3 | Dynamic changes in variation during stimulation. a, b, The 
relationship between expression and H3K27ac binding depends on « (a), but 
not on i (b). Plots show average promoter read density (black high; white low; 
scale bar, bottom) for H3K27ac in LPS 2h (a, b, left) and unstimulated cells 
(a, middle; b, right), or H3K4me3 in 2 h LPS (a, right) in genes corresponding to 
each of 10 quantile bins of population expression (y axis) and each of 10 
quantile bins of o (a, x axis) or |1 (b, x axis) (Supplementary Information). c, Bar 
plots showing P values of correlation between average expression levels and 
K27ac only for immune response genes either as is (red) or when controlling for 
ut (blue) or o (green). Matching R values for all genes: 0.29 (LPS 1h, as is), 


and unimodal expression by 4 h (Figs 1 and 2). In contrast, « decreased 
at later time points for peaked inflammatory genes, especially from 2 to 
4h (Fig. 3d, middle; Extended Data Fig. 5f). The temporal patterns in 
core antiviral gene activation were shared between LPS and PIC. How- 
ever, unlike in LPS, peaked inflammatory gene expression did not dimin- 
ish under PAM, and we did not observe statistically significant decreases 
in o at later time points (Fig. 3d). These coherent shifts suggest that var- 
iability reflects regulated immune response phenomena, rather than 
unconstrained stochastic transcription. 


Intercellular determinants of variation 


Both differences in intracellular components’* and changes in the 
cellular microenvironment” can affect heterogeneity. In particular, 
slow diffusion of cytokines and chemokines could lead to local vari- 
ation in intercellular signals. As the core antiviral module is enriched 
for targets of IFN-B, we speculated that upstream variability in IFN-B 
exposure may drive its heterogeneity (median « = 0.52; 30% of genes 
significantly bimodal, P < 0.01, LRT, Extended Data Fig. 9), and thus 
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0.18 (LPS 1h, controlling for 1), 0.06 (LPS 1h, controlling for «), 0.33 (LPS 2h, 
as is), 0.23 (LPS 2h, controlling for 1), 0.08 (LPS 2h, controlling for «) 

d, Dynamic changes in « and 1 in each module. Bar plots showing, for each 
module in select conditions (annotated on top), the fraction of genes (y axis) 
with a significant change only in & (P < 0.01, likelihood ratio test, blue), only in 
tt (P< 0.01, Wilcoxon test, green), or in both (each test independently, light 
blue), at each transition (x axis). The number of genes over which the 
proportion is calculated is marked on top of each pair of bars (Supplementary 
Information, Extended Data Fig. 5f). 


profiled cells 2h after IFN-B stimulation. Supporting our hypothesis, 
cells stimulated with IFN-B for 2h exhibited sharply reduced digital 
variation in the core antiviral module (Fig. 4a, median « = 0.82; 7% of 
genes significantly bimodal). 


Precocious expressers of antiviral genes 
We next explored the cellular source of interferon in the native LPS 
response. At 2 h following LPS, Ifnb1 was bimodally expressed (P< 10 *, 
LRT) and correlated with the expression of the core antiviral module 
(Extended Data Fig. 9a, d, e). This observation, together with the sup- 
pression of digital variation under an IFN-B stimulus, suggested that, 
in response to LPS, a few cells may first produce (Extended Data Fig. 9d) 
and secrete a wave of interferon, leading to a gradual coordination of 
the core antiviral module at later time points via paracrine signalling. 
To test this hypothesis, we computed a core antiviral activation score 
(Supplementary Information) for each cell and compared scores across 
the LPS time course (Fig. 4b, Extended Data Fig. 9e, fand 10a and Sup- 
plementary Information). Although most cells activated the module 
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Figure 4 | IFN-B feedback drives heterogeneity in the expression of core 
antiviral targets. a, Single-cell expression distributions for Rsad (top) and 
Stat2 (bottom) after stimulating with LPS (left, black) or IFN-B (right, magenta) 
for 2h. b, The core antiviral score (y axis; Supplementary Information, 
Extended Data Figs 9f and 10a) for each LPS-stimulated cell at each time point 
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cells (yellow asterisk) have unusually high antiviral scores at 1h LPS. 
c, RNA-FISH confirms the presence of rare precocious responders (arrow; 


yellow asterisk), positive for both [fnb1 (magenta) and [fit (cyan) 1 h after LPS 


stimulation. Grey, cell outlines. Scale bar represents 25 um. 
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between 2 h and 4h, we discovered two cells with strong core antiviral 
activation at 1 h (Fig. 4b, c, Extended Data Fig. 9f, i, yellow asterisks). 
Weverified the existence and scarcity of these precocious cells 1 h after 
LPS stimulation by RNA-FISH (Fig. 4c, Supplementary Information); 
here, appreciable [fit1 and Ifnb1 co-expression was detected in only 
0.8% of cells (23 of 2,960, mRNA count = 5 copies, P= 2 X 107°, 
proportion test). These precocious cells were indistinguishable from 
the others except in their expression of the ~100 core antiviral genes 
(Extended Data Fig. 9j, k). We observed similar early responding cells 
following PIC or PAM stimulation (Extended Data Fig. 9f, h and 10a). 

Although these precocious cells are reminiscent of the ‘sentinels’ that 
have been reported in viral infections and stimulations of fibroblasts’””* 
(Supplementary Note, Supplementary Information), we note that, in 
those studies, variable response may be partially due to differences in 
cells’ ability to sense and respond to the primary stimulus (for example, 
due to lack of viral sensing or replication). In contrast, all dendritic cells 
rapidly sense and respond to LPS, as evidenced by the unimodal activa- 
tion of peaked inflammatory genes at early time points (Extended Data 
Figs 5b-d, 10a; Fig. 2a, Tnf). 


Intercellular communication and variation 

To examine whether the rare precocious cells were required for coordi- 
nating the core antiviral response, we developed an approach to stimu- 
late cells in the absence of cell-to-cell communication. Modifying the 
standard C, workflow, we captured individual unstimulated dendritic 
cells ina C, chip (Supplementary Information), washed in LPS-containing 
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Figure 5 | Microfluidic blocking of cell-to-cell signalling affects response 
heterogeneity in the core antiviral and peaked inflammatory modules. 

a, Experimental blocking of cell-to-cell communication. Left: C, chip; right: 
actuation of microfluidic valves (red bars), following on-chip LPS stimulation, 
isolates individual cells in sealed chambers, preventing intercellular signalling. 
b, Expression of the genes (rows) in the core antiviral (Ig, top rows) and 
peaked inflammatory (III., bottom rows) modules in single cells (columns) 
from the in-tube (left) and on-chip (right) stimulations. c, Gene expression 
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media, and then immediately sealed each microfluidic chamber to isolate 
stimulated cells individually for 4h (on-chip stimulation, Supplementary 
Information, Fig. 5a). Key experimental conditions, including cell den- 
sity, were similar between the in-tube and on-chip experiments (Sup- 
plementary Information). 

In the absence of cell-to-cell communication, core antiviral module 
genes were bimodally expressed (Fig. 5), with only 8 cells (20%) exhib- 
iting weak activation of the core antiviral module at 4h (Fig. 5b-d, 
Extended Data Fig. 9e), probably mimicking the precocious cells observed 
in-tube at 1 h. This observation suggests an approximate bound for the 
number of cells capable of autonomously inducing a response by 4h. 
Removing cell-to-cell communication also downregulated the expres- 
sion of maturation markers in all cells and some of the sustained inflam- 
matory genes (Extended Data Fig. 10a), although other key inflammatory 
genes were unaffected. 

Surprisingly, blocking intercellular communication also sharply altered 
the single-cell expression of peaked inflammatory genes (Fig. 5b-d). 
Genes encoding key inflammatory cytokines (for example, Tnf, Cxcl1) 
switched from bimodal (« = 0.77, 0.56, respectively) to unimodal (« = 1.0, 
0.91; LRT for corresponding orp: P< 10 *4,P<10 '%, respectively) 
expression on-chip (Fig. 5b, c). Indeed, a large portion of the peaked 
inflammatory genes that were bimodal (LRT P < 0.01) after a 4h LPS 
stimulation in-tube shifted to unimodal expression on-chip (Extended 
Data Fig. 10a, b; P< 0.01, hypergeometric test), indicating that cell-to- 
cell signalling is required for dampening the peaked inflammatory 
program at later time points following LPS. The opposite behaviours 
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distributions for representative genes from the core antiviral (top) and peaked 
inflammatory (bottom) modules in the in-tube (left, black) or on-chip (right, 
blue) 4h LPS stimulation. d, Violin plots of, top to bottom, the core antiviral 
(Supplementary Information, top), peaked inflammatory (middle), and 
sustained inflammatory (bottom) scores for individual cells from the 
stimulation conditions listed on the x axis. Yellow asterisks: the two precocious 
cells from Fig. 4 (Extended Data Fig. 10a). 


19 JUNE 2014 | VOL 510 | NATURE | 367 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


of the core antiviral and peaked inflammatory modules indicate that 
intercellular communication can have opposing effects on variation for 
different gene modules within the same cell. 


IFN-B and peaked inflammatory genes 

On-chip isolation conflates the effects of different paracrine signals 
and the loss of cell-to-cell contact. To distinguish these situations, we 
first profiled dendritic cells from Interferon receptor knockout mice 
(Ifnar1 EY. As expected, and consistent with previous findings’®, anti- 
viral gene expression was undetectable at 4h in all Ifnarl~‘~ dendritic 
cells, implying that even the precocious cells require autocrine interferon 
feedback to activate and sustain their core antiviral responses (Extended 
Data Fig. 10g). This is further supported by the decoupling of the expres- 
sion of Ifnb1 and the core antiviral module in Ifnar1~'~ dendritic cells 
stimulated with LPS for 2 h (Extended Data Fig. 9e). 

Removal of interferon signalling also strongly affected the peaked 
inflammatory module: after 4h of LPS stimulation, Ifnar1~'~ cells 
showed a similar increase in the fraction of activated cells as was seen 
in the on-chip experiment (Fig. 5d, Extended Data Fig. 10a, d, g), sug- 
gesting that the absence of interferon signalling, rather than changes 
in cell-to-cell contact”, was the major driver. Furthermore, dendritic 
cells lacking Stat1, a gene encoding a key transcription factor medi- 
ating interferon responses”, also exhibited increased activation and 
decreased digital variation in peaked inflammatory genes (P < 0.01; 
hypergeometric test; Fig. 5d and Extended Data Fig. 10a, e, g, i). Con- 
versely, the sustained inflammatory module was not appreciably affec- 
ted by the absence of interferon signalling (Fig. 5d and Extended Data 
Fig. 10a, g), implying a different mechanism for its downregulation on-chip. 


Second paracrine wave for downregulation 
Interferon response targets can cross-inhibit inflammatory gene expres- 
sion either through the direct formation of repressive complexes, for 
example, the STAT 1-inclusive ISGF-3, or by inducing the production 
of anti-inflammatory cytokines**. The few cells with on-chip antiviral 
activation exhibited no change in peaked inflammatory gene expression 
(Fig. 5b). This suggests that the repression of peaked inflammatory 
genes, unlike antiviral activation, is not directly downstream of IFN-B 
signalling, but rather may be mediated by a second IFN-B/STAT1- 
dependent paracrine signal. Peaked induction through two asynchron- 
ous paracrine signals is reminiscent of the activation and contraction of 
keratinocytes following wounding and immune infiltration, respectively”. 
To test this hypothesis further, we added brefeldin A (GolgiPlug), a 
secretion inhibitor, either simultaneously with LPS (0h) or at 1 or 2h 
after stimulation, and measured single-cell RNA-seq profiles at 4h 
(Fig. 5d, Extended Data Fig. 10a—c). Inhibiting secretion at the time of 
LPS addition strikingly dampened the antiviral response, similar to the 
on-chip experiment. However, adding brefeldin A at 2 h did not affect 
the activation of the core antiviral module and adding it at 1 h had only 
a modest effect. This indicates that the first hour represents the crucial 
paracrine window for this response. In contrast, for the peaked inflam- 
matory module, addition at each of the three time points resulted in the 
module remaining aberrantly activated at 4h, as on-chip. Collectively, 
these experiments show that paracrine interferon signalling events before 
the 1h time point are crucial for antiviral activation, whereas subsequent, 
separate signalling is responsible for the desynchronized dampening of 
peaked inflammatory gene expression (Supplementary Note, Supplemen- 
tary Information). 


Discussion 


Here we have analysed how gene expression variation between indivi- 
dual dendritic cells changes with stimulus and time to dissect the regu- 
lation of heterogeneity across this immune response. Our statistical 
analysis reveals that changes in digital (on/off) variation can encode a 
diversity of temporal response profiles (Fig. 3d, Extended Data Fig. 5f). 
For example, late-induced core antiviral genes are very weakly expressed 
early, on average, but are highly expressed in a few precocious cells; the 
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progressive dampening of peaked inflammatory genes originates, in 
part, from changes in the fraction of cells detectably expressing these 
transcripts, rather than a uniform, gradual decrease in their expression 
in all cells. 

Such complex average responses can be generated not only through 
intricate intracellular circuits in each cell, but also through intercel- 
lular communication between cells, as we show for both modules. For 
example, we uncovered a small number of precocious cells that express 
Ifnb1 and core antiviral genes as early as 1 h after LPS stimulation, and 
through the secretion of IFN-B, help activate core antiviral genes in 
other cells to coordinate the population response. These cells are indis- 
tinguishable from the rest, except in their expression of the core antiviral 
module (Extended Data Fig. 9j, k), and yet are crucial for an efficient 
and timely population response (Supplementary Note, Supplementary 
Information). 

IFN-B signalling also dampens a subset of induced inflammatory 
genes at later time points, and our brefeldin A (GolgiPlug) experiments 
suggest that a secondary, IFN-B-dependent signal, is involved (Extended 
Data Fig. 10), k). This is consistent with a model in which IFN-f secreted 
by a few cells induces the expression and secretion of secondary anti- 
inflammatory cytokines from a subset of cells, which, in turn, attenuate 
the peaked inflammatory responses of their neighbours. Computational 
analyses, genetic perturbations and recombinant cytokine experiments 
suggest that IL-10 may be involved in this second wave of negative 
signalling (Extended Data Fig. 10h, Supplementary Table 4), but further 
experiments are needed to fully elucidate the mechanism (Supplemen- 
tary Note, Supplementary Information). One involved component may 
be the RNA degradation factor ZFP36 (TTP), whose targets are enriched 
in the peaked inflammatory module”. 

The ability of precocious cells to influence others via paracrine sig- 
nalling may be an efficient strategy for quorum sensing”’, but also may 
be perilous. If the activation threshold is too low, a few stochastically 
responding cells could induce an inappropriate immune response. Indeed, 
this is observed in autoimmune diseases like systemic lupus erythema- 
tosus (SLE), in which excess IFN-$ production potentiates auto-reactive 
dendritic cell activation****. In contrast, excessively stringent thresholds 
may limit rapid responses to a viral infection, or the dampening of 
chronic inflammation (for example, in rheumatoid arthritis or ulcera- 
tive colitis*°*°). Thus, individual cells probably place tight controls on 
the regulation of key cytokines, preferring different induction strat- 
egies under different stimuli to maximize the balance between respon- 
siveness and control. Indeed, similar population-level Ifnb1 expression 
in LPS/PIC (Extended Data Fig. 9c) stems from different underlying 
phenomena: a substantial fraction of cells express the Ifnb1 transcript 
moderately at 2 h LPS (% = 0.35, pp = 5.1), whereas just a few cells express 
Ifnb1 very highly at 2h PIC (« = 0.07, u = 6.31; uncorrelated with the 
cell’s activation of the antiviral response**’’: Extended Data Fig. 9e). 

Using microfluidics, we achieved the statistical power needed to track 
transcriptome-wide changes in single-cell expression variation across a 
variety of conditions, as well as to identify functionally important, rare 
responses. Microfluidics also allowed us to finely control the stimulation 
of our cells. Similar and improved techniques will be essential for char- 
acterizing other rare sub-populations, such as cancer stem cells, and for 
studying heterogeneous clinical samples and tissues. Further innova- 
tions in massively parallel manipulation and profiling of single cells will 
continue to improve our understanding of the rich diversity in, and dy- 
namic functional communities that constitute, multicellular populations. 


METHODS SUMMARY 


Bone-marrow-derived mouse dendritic cells were prepared as previously described’* 
and stimulated with pathogenic stimuli for specified time periods. The C, Single-Cell 
Auto Prep System (Fluidigm) was used to perform SMARTer (Clontech) whole 
transcriptome amplification (WTA)>"*” on up to 96 individual cells. WTA products 
were then converted to Illumina sequencing libraries using Nextera XT (Ilumina)’*. 
RNA-seq libraries were also made from 10,000 cells from each parent population 
(population control). Each sample was sequenced on an Illumina HiSeq 2000 or 
2500, and expression estimates (transcripts per million; TPM) for all UCSC-annotated 
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mouse genes were calculated using RSEM”*. Data were further analysed as described 
in the Supplementary Information. Additional experiments were performed using 
RNA-FISH (Panomics), on-chip isolated stimulation, knockout mice, secretion 
blockers (GolgiPlug, BD Biosciences), protein synthesis blockers (cycloheximide, 
Sigma), and recombinant cytokines. Full Methods and any associated references 
are provided in the Supplementary Information. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Extended Data Figure 1 | Single-cell RNA-seq of hundreds of dendritic cells. 
a, Overview of experimental workflow. b, Shown are read densities for seven 
representative genes (two housekeeping genes (Rpl3 and Actb) and five 
immune response genes (Ifnb , Ifitl, Ccr7, Tnf, Marco)) across 60 single cells 
(blue) and one population control of 10,000 cells (grey; bulk population) after a 
4h LPS treatment. c, Distribution of failure scores for all single cells. Single 
cells with failure scores above 0.4 were discarded (see Supplementary 
Information). d~g, Comparisons of expression estimates for the average single 
cell and the bulk population. d, Scatter plots showing for each gene the relation 
between the average single-cell expression (y axis) and bulk population 

level expression (x axis) for each of four time points following LPS stimulation 
(1, 2, 4 and 6h, left to right). e, The Pearson correlation coefficients (y axis) 
of each comparison, as in d, for each of the time points and stimuli presented in 
Fig. 1, as a function of the number of cells captured in the respective experiment 
(x axis). f, Scatter plots showing the residual (population-level expression 
minus the single cell average) in a LPS 1h experiment (x axis) versus the 
residual in each of 3 other experiments (y axis, left to right): LPS 6h, PIC 4h 
and PAM 2h. g, The Pearson correlation coefficient (y axis) between the 
bulk population level expression and the single-cell expression average when a 
different number of sub-sampled cells (x axis) is included in the single-cell 
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average. h, i, Effects of Hoechst dye and periodic mixing on mRNA expression. 
h, Comparable expression levels after 4h LPS with the addition of small 
amounts of Hoechst to aid in cell counting (x axis) and when no dye is used 
(y axis), when looking at all genes (left) or only immune response genes (right). 
i, Comparable expression levels after 4h LPS with hourly mixing (x axis) or 
with no mixing (y axis), when looking at all genes (left) or only immune 
response elements (right). j, Core antiviral, peaked inflammatory, and 
sustained inflammatory module activation scores for a 0.1X LPS stimulation. 
Shown are violin plots of the scores (y axis) for the core antiviral 
(Supplementary Information, top), peaked inflammatory (Supplementary 
Information, middle), and sustained inflammatory modules (Supplementary 
Information, bottom) for each cell in (left to right): LPS 0h, 1X (100 ng ml’) 
LPS 4h, and 0.1X (10 ng ml’) LPS 4h. k-n, PCA of stimulated dendritic 
cells. k, First two principal components (or PCs) from a PCA performed on 
the LPS stimulation time course. From top to bottom: unstimulated/LPS 0h, 
LPS 1h, LPS 2h, LPS 4h, LPS 6h. I-n, PCAs (left) and the distributions of 
scores (right) for each of the first three PCs for samples collected after 
stimulation with LPS (1), PAM (m), or PIC (n), for 1 (yellow), 2 (blue), 

4 (grey) and 6 (red) hours. A single PCA was performed for all cells in all three 
time courses. 
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Extended Data Figure 2 | Effects of shallow read depth on expression 
estimates. a, b, A million reads per cell are sufficient to estimate expression 
levels. a, Scatter plot for a single cell (from Shalek et al.'°) showing the relation 
between expression estimates calculated using 30 M reads (x axis) or a 
sub-sample of 1 M reads (y axis). b, Scatter plots for six different dendritic cells 
stimulated for 4h with LPS. Each plot shows the relation between expression 
estimates calculated using all reads (x axis; number of reads marked on axis 
label) or a sub-sample of 1 M reads (y axis). In all cases, R > 0.99. Note that 
although, in principle, no gene should be estimated as present only in a 
subsample but not the full data set, this does occur for a very small number of 
genes (for example, four genes in cell 3), representing a nuanced technical 
error in RNA-seq estimation. Consider two expressed genes, A and B, from 
distinct loci, but with a short stretch of sequence identity. At low sequencing 
depth, if reads only map to the shared region, estimation tools, such as RSEM*° 
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(used here), can guess erroneously which gene is expressed, such that additional 
sequencing depth can ‘flip’ the assignment of an uncertain read from gene A 
to gene B. These cases are extremely rare, and have a negligible effect. 

c-e, A million reads per cell are sufficient to estimate 1, o”, and a. Scatter plots 
showing the relation between © (c), 1 (d), and o* (e) values estimated using 
10M reads per cell (on average; x axis) or a sub-sample of 1 M reads per cell 
(y axis) from RNA-seq libraries prepared from individual bone-marrow- 
derived dendritic cells stimulated for 4h with LPS. c, For almost all genes, 

1M reads are sufficient to estimate «. For a very small fraction (<0.1%) of 
weakly expressed genes, estimates of « are improved with increased sequencing 
depth. For pt (d) and o” (e), estimates are plotted for all genes (left), only 
genes detected in more than 10 cells (middle), or only those genes detected in 
more than 30 cells (right). 
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Extended Data Figure 3 | Technical and biological reproducibility. 

a-d, Scatter plots showing the relationship between the average single-cell 
expression estimates in either of two technical replicates (LPS 2h (a), LPS 4h 
(b)) or two biological replicates (unstimulated/LPS 0h (c), LPS 4h (d)) for all 
genes (top), immune response genes (middle), or non-immune response genes 
(bottom). e, f, QQ plots (top) and MA plots (bottom) showing the similarity 
in expression estimates for the two LPS 2h technical replicates (e) or the two 
LPS 4h technical replicates (f). Plots are provided across all genes (left), 
non-immune response genes (middle), or immune response genes (right). 

g, h, QQ plots (top) and MA plots (bottom) showing the similarity in 
expression estimates for all cells (including cluster-disrupted cells) in the two 
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LPS 0 h/unstimulated biological replicates (g) or the two LPS 4h biological 
replicates (h) across either all genes (left), non-immune response genes 
(middle), or immune response genes (right). Note, that slight variations in the 
fraction of cluster-disrupted cells and activation state of one of the two 0h 
samples results in mild deviations between immune response gene estimates in 
those biological replicates. i, j, PCA for the two LPS 4h technical replicates. 

i, The first two principal components (PC1 and PC2, x and y axis, respectively) 
ofa PCA from the two LPS 4h stimulation technical replicates (blue: replicate 1; 
red: replicate 2). j, The distributions of scores for cells from each 

of the two technical replicates on each of the first five PCs (left to right: PC1 
to PCS). 
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Extended Data Figure 4 | Cluster disruption. a, Single-cell expression 
distributions for SerpinB6b (a positive marker of cluster disruption) and Lyz1 
(a negative marker of cluster disruption) at each time point (marked on top) 
after stimulation with LPS (all cells included, see Supplementary Information). 
Distributions are scaled to have the same maximum height. b, Difference in 
mRNA expression as measured by qRT-PCR (with a Gapdh control) between 
Lyz1 or SerpinB6b in cells pre-sorted before stimulation on the presence or 
absence of CD83 expression (CD83* and CD83, respectively), a known cell 
surface marker of cluster-disrupted cells (see Supplementary Information). 
Pre-sorted cells were then either unstimulated (blue) or stimulated (red) with 
LPS for 4h. c, Expression of cluster-disruption markers does not change with 


ARTICLE 


stimulation. RT-PCR showing the difference between Gapdh (control) and 
Lyz1 or SerpinB6b expression in cells pre-sorted on Cd83 either in the presence 
or absence of simulation with LPS. d, PCA showing the separation between 
maturing’ (blue dots) and cluster-disrupted (red dots) cells. e, Expression of 
cluster disruption markers for cells stimulated with LPS on-chip. For each cell 
(black dot) stimulated with LPS on-chip, shown are the expression levels 

(x axis) of SerpinBé6b (cluster disruption cell marker, left) and Lyz1 (normal 
maturing cell marker, right) versus its antiviral score (y axis). With one 
exception, the cells are clearly maturing and not cluster-disrupted. Red shading: 
range of expression in cluster-disrupted cells. 
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Extended Data Figure 5 | Time-dependent behaviours of single cells from 
different modules and stimuli. a—c, For each of the three key modules: core 
antiviral, 14 (a), peaked inflammatory, III, (b), and sustained inflammatory, 
IIIy (c) shown are wave plots of all of its constituent genes in bone-marrow- 
derived dendritic cells stimulated with PAM (top), LPS (middle), or PIC 
(bottom) for 0, 1, 2, 4 and 6h (left to right). x axis: expression level, 

In(TPM + 1); y axis: genes; z axis: single-cell expression density (proportion 
of cells expressing at that level). Genes are ordered from lowest to highest 
average expression value at the 4h LPS time point. d, Contributions of each 
module to measured variation. Significance of the contribution of modules 
I,-Ig and III,-IIIg from Fig. 1 to the variation measured throughout the 
stimulation time course. Shown is the P value (Mann-Whitney test) of the 
tested association between each gene module and the first three PCs, calculated 
using a statistical resampling method (see Supplementary Information). 

Only the core antiviral, maturity, and peaked/sustained inflammatory clusters 
show statistically significant enrichments with the three PCs. e, Gene modules 
show coherent shifts in single-cell expression. Shown are heat maps of scaled 


Time 


Fraction changing in: [Jj o 


ok Bu 


a (left), 11 (middle), and o” (right) values (colour bar, bottom) in each time 
course (LPS, PAM, PIC) for the genes in each of the three key modules (rows, 
modules marked on left). Heat maps are row-normalized across all three 
stimuli, with separate scalings for each of the three parameters, to highlight 
temporal dynamics. Genes are clustered as in Fig. 1. f, Dynamic changes in 
variation during stimulation for each module. For each module (rows) and 
stimulus (columns), shown are bar plots of the fraction of genes (y axis) with a 
significant change only in o (by a likelihood ratio test, P< 0.01, blue), only in p. 
(Wilcoxon test, P< 0.01, green), or in both (each test independently, light 
blue), at each transition (x axis), in different conditions (marked on top), 
separated by whether they increase or decrease during that transition. In each 
module and condition, the proportion is calculated out of the genes in the 
module that are significantly bimodal (by a likelihood ratio test) in at least 
one time point during the LPS response and are expressed in at least 10 cells in 
both conditions. Their number is marked on top of each bar; conditions with 3 
or fewer genes changing are semi-transparent. 
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Extended Data Figure 6 | Comparison of single-cell RNA-seq to RNA- 
FISH. a-f, Single-cell mRNA expression distributions by RNA-FISH and 
single-cell RNA-seq. a, Representative images of genes analysed by RNA-FISH 
at 1h and 4h after LPS stimulation. b-f, mRNA expression distributions for 
the housekeeping gene B2m (b), the peaked inflammatory gene Cxcl1 (c), the 
core antiviral gene Ifit1 (d), the sustained inflammatory gene II6 (e), and the 
peaked inflammatory gene Tnf (f) measured using either single-cell RNA-seq 
(top, black curves) or RNA-FISH (black histograms; no smoothing) in either 
unstimulated cells (LPS 0h) or cells stimulated with LPS for 1, 2, 4 or 6h. 
g-j, Determining the detection limit of single-cell RNA-seq by comparison 
to RNA-FISH. For each of 25 genes, we compared single-cell RNA-seq data 
(y axis, this study) to RNA-FISH data (x axis, from Shalek et al.) based on either 
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Extended Data Figure 7 | Fitting gene expression distributions. a, Flow 
chart of model fitting. Shown are the key steps in fitting the 3-parameter model. 
b, Examples of cases where fitting a multimodal distribution is required. 
Single-cell expression distributions for (top to bottom) Car13, Rgs1, Ms4a6c 
and K/f6 at (left to right) 1, 2, 4, and 6 h (marked on top) after stimulation with 
LPS. Distributions are scaled to have the same maximum height. Data: black 
lines; Bimodal fits: grey lines; Multimodal fits: blue lines. P values (colour- 
coded) calculated using a goodness-of-fit test (a low P value rejects the fit; see 
Supplementary Information). ce, Reproducibility of gene-specific fitting of the 
undetected mode, when fitting a mix of two normal distributions to all data 
points, including those with In(TPM + 1) <1. ¢, d, Scatter plot showing the 
correlation between |; and [ly estimates for the two LPS 4h technical replicates 
(Supplementary Information), where 1; and |, are the two component means 
(in decreasing order of magnitude) of the two mixed normal distributions. 
Estimates for | correlate poorly between technical replicates, particularly 
when focusing on genes for which pl, is greater than 1 (e), suggesting that the 
current data set does not support the use of this additional fit parameter. 

f, Robustness of « estimates to small deviations in the threshold. Scatter plots 
showing the correlation between o estimates determined when using a cut-off 
of In(TPM + 1) = 1 (x axis) versus when using a cut-off of In(TPM + 1) = 0.25 
(y axis, left); 0.5 (y axis, middle) or 2 (y axis, right) for the LPS time course 
(top to bottom: 1h, 2h, 4h and 6h). g, Saturation curves for estimates of 1, 0’, 
and «. Box plots depicting the Pearson correlation coefficient between « (top), 
ut (middle), or o° (bottom) in two LPS 4h technical replicates, as a function of 


the number of cells randomly drawn from each replicate (full details in 
Supplementary Information). Plots are shown for all genes (left), as well as 
those detected in more than 10 (middle) or 30 cells, (right) in both replicates 
(full data sets). h, i, Correcting for the relationship between mean expression 
and average detection. h, The probability of detecting a transcript (y axis) in a 
cell as a function of 1 (x axis). Black, grey curves are two illustrative cells from 
the LPS 4h time point. i, Differences in %4r,, a stringently-corrected MLE 
estimate of x (Supplementary Information), across the LPS time course. Shown 
are the box plots of ox values (y axis) for bimodally expressed genes 
(determined by a likelihood ratio test, Supplementary Information) at each 
time point (1, 2, 4, and 6h) following LPS stimulation (x axis), as well as for the 
on-chip 4h LPS stimulation, for each of the core antiviral (left), peaked 
inflammatory (middle) and sustained inflammatory (right) modules. Stars 
represent intervals where there is a significant difference in a parameter 
between two consecutive time points, as determined by a Wilcoxon rank sum 
test (single star: P< 10 ; double star: P< 107°). j-I, Estimating an upper 
bound on & using a likelihood test. For each of three transcripts (Ifit1 (j); Rsad2 
(k); and Cxcll (1)), shown are their expression distributions (red, left) and the 
matching likelihood function for a stringent upper estimate of « (blue dots, 
right), when considering a null model where expression is distributed in a 
lognormal fashion and any deviations are due to technical detection limits 
(Supplementary Information). Red vertical line: & rg; black vertical line: 
nominal . Vertical green bars signify the nominal estimation of a, representing 
the fraction of cells with detected expression of a transcript. 


©2014 Macmillan Publishers Limited. All rights reserved 


0,- 2h LPS Technical Replicate 2 


.- 4h LPS Technical Replicate 2 


ARTICLE 


@- 0h LPS Biological Replicate 2 


@.- 4h LPS Biological Replicate 2 


All genes All genes Genes in > 10 Cells, Genes in > 20 Cells Genes in > 30 Cells Genes in > 40 Cells, Genes in > 50 Cells 
2 i # Re0.742 R=0.941 R=0.968 R=0978 R=0998 F=0.088 
3 E ° 
© 
Se 
a 
s an. 4 3 Ps 2 a 
a ee ee ee a 
| -2h LPS Technical Replicate 1 
3 Cy All genes Genes in > 10 Cells Genes in > 20 Cells Genes in > 30 Cells Genes in> 40 Cells Genes in > 50 Cells 
Zo e 7 7 a 
3 R=0300 ; neoee7 Rearel : Reo7ee ; ReOsis ne0768 - 
a ” 
3 ° 
fe e 
3 3s 
3 g 
£ 
a 
2 
3 ie . 
& a -- 
T T ee 
00 02 os 08 08 19 o o 4 8 a. 2 4 6 so 2 4 6 soo 2 4 6 a 2 4 6 a 0 2 4 6 8 
.- 2h LPS Technical Replicate 1 ©? - 2h LPS Technical Replicate 1 
All genes e All genes Genes in > 10 Cells Genes in > 20 Cells Genes in > 30 Cells, Genes in > 40 Cells Genes in > 50 Cells 
2 Ss R=0668 R=0.994 R=0.983 R= R=0977 R=0970 
t 2 x ” 
3 i ° 
2 
S. 
= 
8 2 E , F Ki, 
2 oe 8 2 2 ee ee Re Oe ee ee OR Oe RR ee a OR my Ge we He 
[ - 4h LPS Technical Replicate 1 
3 f o Genes in> 10 Cells Genes in > 20 Cells, Genes in > 30 Cells, Genes in > 40 Cells Genes in > 50 Cells, 
2 o a 7 - 
8 R=00es R=0705 ” R=0820 y Ra0ese Raoas2 
g , Z 
3 
ce 
3 3 
8 
b 
Bw 
= 5 
+ = 
oo oz oa 08 os 10 6 8 o 2 4 6 8 ° 2 4 6 8 ° 2 4 6 8 ° 2 4 6 8 o 2 4 6 8 
@- 4h LPS Technical Replicate 1 6? - 4h LPS Technical Replicate 1 
h a All genes Genes in > 10 Cells Genes in > 20 Cells Genes in > 30 Cells Genes in > 40 Cells Genes in > 50 Cells 
t 2 R= 0765 R=0.904 R=0.024 R=0.027 R=0.928 R=0022 
8 
Se 
Ey 
s 
Be 
© 
a. 
= 
a 4 
2 eo 8 © @ 8 4 8 8 © 2@ 2 4 8 8 © © 2 4 6 8 © 2 fe 4 6 8 © @ ef 4 8 8 0 8 
. lt - Oh LPS Biological Replicate 1 
| = Genes in > 10 Cells, Genes in > 20 Cells, Genes in > 30 Cells, Genes in > 40 Cells, Genes in > 50 Cells, 
3° R=0825 “ R=0.703 “ R=0760 - R=0910 , R=0.003 a 
s F 2 7 ae 
Ls , , 
Be 
= 
a 
2 a 
S 
6 
oo 02 oa os os 10 db ’ . sf 6 2 © © & ss = fe - Oe ee oe a ek ek 2s ok = he 
@.- 0h LPS Biological Replicate 1 ‘0° - 0h LPS Biological Replicate 1 
All genes k o All genes Genes in > 10 Cells Genes in > 20 Cells Genes in > 30 Cells Genes in > 40 Cells Genes in> 50 Cells 
f R=0704 R=0911 R=0.926 Fs R=0927 Re0829 R= 00928 
es 2 J 
Ze 
2 
o 
Be 
© 
Se 
5 
= 
Fa 
S c, 
2 ee a RO a we 
[1 - 4h LPS Biological Replicate 1 
l a All genes Genes in > 10 Cells, Genes in > 20 Cells, Genes in > 30 Cells Genes in > 40 Cells, Genes in > 50 Cells, 
* R=0393 e R= 0902 R=0.80 R= 0887 Reoeet Reaees @ ,?” 
5 E 
ca ‘ 
8 z al 
o> 
8. 
4 
a 
© 
2 « 
s 
= 
os oe © 6 © = «= 8 & 8 © o- © = 8 
@.- 4h LPS Biological Replicate 1 0? - 4h LPS Biological Replicate 1 
LPS 2h Technical Replicates oO p 
3 Technical Replicated Biological Replicates 
a Au Ao? Au Ac* 
[| P=tse27 B | pezcet2 
“Core” zs sore es 
Antiviral bs Antiviral as 
Id % Id 8 
5-40-18 -(0 65 00 os 10-15 -10-08 60 0s 10 1s 68 -ds 4 62 oo G2 “3s 20-13 10-8 60 OS a} i & 8 0s «2 do o2 
8 0 » wo mo © 3 10 2 © © 5 © obs 8 8 PA p=0.0019 p=Aze-09 | px acces P= 000067 | [pesto vee 
{aonge ete yioan) (annge tee eyes) “jacnoecttan opie) "Peaked” Le “Peaked” zs 
Inflammatory 2 © Inflammatory Be 
LPS 4h Technical Replicates a3 as 
Ile as Z. Mle x 
3 3 aa — 
“is 10-05 00 05 10 15 10-5 00 05 10 62-0100 07 02 G3 OM =is 10-05 09 05 10 15 2 or cee) 
a p=o79 p= 600-10 3 peose 
“Sustained” ag Z “Sustained” ze 
Inflammatory i z, a Inflammatory fs 
Md 5 as, id a 
ods 10-85 00 05 10 15 6 i 2 <0 -da G0 0B CCsio ds 00 ch vets 4 a 2 4 0 7 2 


© 10 2 2 4 4 © 70 
Number of expressing ces 


{average of two repeats) 


© 10 20 40 40 60 60 70 ° 
‘Number of expressing cali eu 
{average of two repeats) 


zs 6 8 wo 
tk-ovel expression -In(TPM!) 
{erage of two repeats) 


Ln rN 


== Shift between LPS 2h and LPS 4h 
= Shift between technical replicates 


©2014 Macmillan Publishers Limited. All rights reserved 


Ln TPM 


== Shift between LPS 2h and LPS 4h 
‘—— Shift between biological replicates 


ARTICLE 


Extended Data Figure 8 | Reproducibility of estimated p, 6” and a 
parameters. a-f, Reproducibility of estimated p1, o” and o parameters between 
technical replicates. Scatter plots showing the relation between the estimated 
ot (a), Lt (b), and o” (c) values for the two unstimulated/LPS 0h technical 
replicates. For p1 (b) and o” (c), estimates are plotted for all genes (farthest on 
the left), as well as (Left to right) for genes only detected in more than 10, 20, 30, 
40 or 50 cells, respectively. d-f, show the same plots for the two LPS 4h 
technical replicates. g, h, Reproducibility of estimated 1, o” and « parameters 
between biological replicates. Scatter plots showing the relation between the 
a. (g), b (h), and o” (i) values estimates for the two LPS 2h biological replicates. 
For pt (h) and fom (i), estimates are plotted for all genes (farthest on the left), 
as well as (left to right) for genes only detected in more than 10, 20, 30, 40 or 50 
cells, respectively. j-1, show the same plots for the two LPS 4h biological 
replicates. m, n, Relationship between per-gene errors for 1, o” and o and the 
number of cells in which the gene’s expression is detected, or its bulk expression 
level. Scatter plots displaying the differences in the oO” (left), 11 (middle) and 


a (right) estimates for each gene between technical replicates for LPS 2 h (m) or 
LPS 4h (n) (y axis) as a function of either the number of cells in which the 
transcript is detected (x axis, for | and 0°), or the transcript’s bulk expression 
level (TPM, x axis, for «). Notably, o (left) estimates saturate (denoted by a 
magenta line and shaded box) after ~30 detectable events, whereas 1 estimates 
saturate after ~10. Dashed orange line: 95% confidence interval. 0, p, Changes 
in p, o” and & between time points are significant as compared to null 
models from both technical and biological replicates. Shown are the cumulative 
distribution functions (CDE) for shifts in 1 (left), o* (middle), and « (right) 
between 2 hand 4h (red CDF) for the core antiviral (top), peaked inflammatory 
(middle), and sustained inflammatory (bottom) modules compared to a null 
model (black CDF) derived from either technical (0) or biological (p) replicates 
(Supplementary Information). In the vast majority of cases, the changes 
between time points are significant, as assessed by a Kolmogorov-Smirnof (KS) 
test (P value of test in the upper left corner of each plot). 
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Extended Data Figure 9 | Ifnb1 expression, production, and precocious 
cells. a, b, Ifnb1 mRNA expression and the effect of IFN-B on variation. 

a, Single-cell expression distributions for the Ifnb1 transcript at each time point 
(top) after stimulation with PAM (top, green), LPS (middle, black), or PIC 
(bottom, magenta). Distributions are produced with the density function in 
R with default parameters, and scaled to have the same maximum density. 

b, For each of three modules defined in Fig. 1 (core antiviral, top; peaked 
inflammatory, middle; sustained inflammatory, bottom), shown are bar plots of 
the fraction of genes (y axis) with a significant change only in « (by a likelihood 
ratio test, P< 0.01, blue), only in 1 (Wilcoxon test, P< 0.01, green), or in 
both (each test independently, light blue) between the 2h LPS stimulation and 
the 2h IFN-f stimulation separated by whether they increase or decrease 
during that transition. In each module and condition, the proportion is 
calculated out of the genes in the module that are significantly bimodal (by a 
likelihood ratio test) in at least time point during the LPS response and are 
expressed in at least 10 cells in both conditions. Their number is marked on top 
of each bar. c, d, Ifnb1 mRNA expression patterns and effect of cycloheximide. 
c, From top to bottom, population average Ifnbl mRNA expression (top), 
single-cell average Ifnb1 mRNA expression (second to top), « (second to 
bottom) and kt (bottom) estimates for [fnb1 for each stimulation condition in 
Fig. 1. Grey star at 6h for PIC denotes uncertainty due to the small number of 
cells captured at that time point. d, Shown are box plots of the core antiviral 
scores (population level, see Supplementary Information) after a 4h LPS 
stimulation either where cycloheximide was added at the time of stimulation 
(right, blue), or during a standard 4h LPS control (left, green). Core antiviral 
expression is dramatically decreased by the addition of cycloheximide, 
suggesting that newly produced protein is needed to initiate the antiviral 
response. e, Relationship between core antiviral gene expression and Ifnb1 
mRNA expression during the LPS, PAM and PIC stimulation time courses and 
in follow-up experiments. Shown are the expression of core antiviral genes 
(heat maps: rows, gene; columns, cells) for cells stimulated for 0, 1, 2, 4 or 6h 
(left to right) with LPS (top), PAM (middle), or PIC (bottom). Beneath each 
heat map, grey bars depict the core antiviral score (middle panel, see 


Supplementary Information) and blue dots show Ifnb1 mRNA expression for 
each cell in every heat map. f-k, Identifying the precocious cells. f, Core 
antiviral scores for cells stimulated with LPS, PIC, or PAM. Shown are violin 
plots of the core antiviral module scores (Supplementary Information, y axis) 
for each cell from time course experiments (from left: 0, 1, 2, 4 and 6h) of 
dendritic cells stimulated with LPS (top), PIC (middle) or PAM (bottom). Two 
precocious cells (yellow stars, top panel) have unusually high antiviral scores at 
1h LPS (yellow stars, top); similarly precocious cells can be seen in PIC at 1h 
and 2h (orange stars, middle) or in PAM at 2h (turquoise stars, bottom). 

g, Precocious cells in all three responses are typical maturing cells. PCA 
showing the separation between maturing (blue dots) and cluster-disrupted 
(red dots) cells (top), as well as only maturing (middle) or only cluster- 
disrupted (bottom) cells (all as also shown in Extended Data Fig. 4d). The 
precocious cells from each of the responses are marked as stars (colours as in 
(f)), and all fall well within the maturing cells. h, Precocious cells in all three 
responses express Lyz1 and do not express SerpinB6b. Shown are mRNA 
expression distributions for SerpinBé6b (cluster disruption cell marker, left) 
and Lyz1 (normal maturing cell marker, right) in LPS 1h, PIC 1h and 2h, and 
PAM 1h (top to bottom). The typical range for expression in cluster-disrupted 
cells is shaded in red. The precocious cells from each of the responses are 
marked as stars (colours as in (f)), and all fall well within the maturing cells. 
i, Normal quantile plots of the expression of genes from the core (cluster Ig, left) 
and secondary (cluster I, right) antiviral clusters at 1 h LPS. The two precocious 
cells (yellow stars) express unusually high levels of core antiviral genes (left) 
but not of secondary genes (right). j, k, The precocious cells are only 
distinguished by the expression of ~100 core antiviral genes. Shown are the 
distributions of scores for each of the first six PCs (right) for samples collected 
after stimulation with LPS for 1h with (j) or without (k) the inclusion of 
core antiviral genes. Precocious cells (vertical red bars), normally distinguished 
by the third and fourth principle components (j), become indistinguishable 
from all other cells if the ~ 100 core antiviral genes are excluded (k) before 
performing the PCA. 
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Extended Data Figure 10 | Characterizing the precocious cells. a, Core 
antiviral, peaked inflammatory, and sustained inflammatory module scores 
during the LPS time course and follow-up experiments. Shown are violin plots 
of the scores (y axis) for the core antiviral (Supplementary Information, top), 
peaked inflammatory (Supplementary Information, middle), and sustained 
inflammatory (Supplementary Information, bottom) modules for cells in each 
of the experiments (from left to right): LPS 0h, LPS 1h, LPS 2h, LPS 2h 
technical replicate 1, LPS 2h technical replicate 2, LPS 4h, LPS 4h technical 
replicate 1, LPS 4h technical replicate 2, LPS 4h biological replicate, LPS 6 h, 
IFN-B 2h, on-chip unstimulated, on-chip LPS 4h, LPS 4h with GolgiPlug at 
Oh, LPS 4h with GolgiPlug at 1h, LPS 4h with GolgiPlug at 2h, LPS 4h with 
Tfnar' ~ dendritic cells, and LPS 4h with Stat1~'~ dendritic cells. Yellow stars: 
the two precocious cells at 1h LPS. b, Changes in expression and variation 
during stimulation in the on-chip 4h LPS stimulation. For genes in the (from 
top to bottom) core antiviral, maturity, peaked inflammatory and sustained 
inflammatory modules, shown are bar plots of the fraction of genes (y axis) with 
a significant change only in « (by a likelihood ratio test, P< 0.01, blue), only in 
i (Wilcoxon test, P< 0.01, green), or in both (each test independently, light 
blue) between the 4h on-chip LPS stimulation and the 4h in-tube LPS 
stimulation separated by whether they increase or decrease during that 
transition. In each module and condition, the proportion is calculated out of the 
genes in the module that are significantly bimodal (by a likelihood ratio test) 
in at least one time point during the LPS response and are expressed in at least 
10 cells in both conditions. Their number is marked on top of each bar. c-f, Bar 
plots, as in b, for a 4h wild-type LPS stimulation (in-tube) and either a 4h 
in-tube LPS stimulation where GolgiPlug was added 2 h after LPS (c),a 4h LPS 
stimulation of Ifnar ' ~ dendritic cells (d), a 4h LPS stimulation of Statl /~ 
dendritic cells (e), or a 4h LPS Stimulation of Tnfr '~ dendritic cells (f). 


g, Genetic perturbations alter expression and variation in different 
inflammatory and antiviral modules. Shown is the expression of the genes 
(rows) in, from top to bottom: the core antiviral (Iq), maturity (III,), peaked 
inflammatory (III.), and sustained inflammatory (IIIq) modules in single 
cells (columns) in, from left to right: the in-tube, on-chip, Ifnar1 ~~ Stat", 
and Tnfr ‘~ conditions. Yellow/purple colour scale: scaled expression values 
(z-scores). h, Scores of the peaked inflammatory module for Ifnar ‘~ dendritic 
cells with recombinant cytokines. Shown are box plots of the peaked 
inflammatory module scores (Supplementary Information, y axis) for three 
population-level replicates ofa 4h LPS stimulation of Ifnar ‘ dendritic cells to 
which a recombinant cytokine (x axis) has been added at 2 h after stimulation. 
Notably, adding IL-10 significantly reduces the peaked inflammatory module. 
i, Stat1 knockout affects expression and variation of peaked inflammatory 
genes. Shown are expression distributions for five peaked inflammatory 
genes after 4h of LPS stimulation in each of three conditions: in-tube 
stimulation of wild-type dendritic cells (control; left), on-chip stimulation of 
wild-type cells (no cell-to-cell signalling; middle), and a stimulation of dendritic 
cells from Stat1~/~ mice (performed in-tube; right). j, k, Population-level 
paracrine signalling enhances and coordinates the core antiviral response while 
dampening and desynchronizing the peaked inflammatory ones. j, Gene 
network model showing how positive IFN-f signalling induces the antiviral 
response and reduces its heterogeneity, while simultaneously activating 
negative paracrine feedback, possibly including IL-10, which dampens the 
peaked inflammatory cluster and increases its heterogeneity. k, Cell population 
model showing how positive and negative paracrine signalling alter antiviral 
(magenta) and inflammatory (green) gene expression variability across cells. 
Grey denotes no expression. 
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The mitochondrial deubiquitinase USP30 
opposes parkin-mediated mitophagy 


Baris Bingol'*, Joy S. Tea'*, Lilian Phu’, Mike Reichelt?, Corey E. Bakalarski*, Qinghua Song”, Oded Foreman’, 


Donald S. Kirkpatrick? & Morgan Sheng! 


Cells maintain healthy mitochondria by degrading damaged mitochondria through mitophagy; defective mitophagy 
is linked to Parkinson’s disease. Here we report that USP30, a deubiquitinase localized to mitochondria, antagonizes 
mitophagy driven by the ubiquitin ligase parkin (also known as PARK2) and protein kinase PINK1, which are encoded 
by two genes associated with Parkinson’s disease. Parkin ubiquitinates and tags damaged mitochondria for clearance. 
Overexpression of USP30 removes ubiquitin attached by parkin onto damaged mitochondria and blocks parkin’s ability to 
drive mitophagy, whereas reducing USP30 activity enhances mitochondrial degradation in neurons. Global ubiquitination 
site profiling identified multiple mitochondrial substrates oppositely regulated by parkin and USP30. Knockdown of 
USP30 rescues the defective mitophagy caused by pathogenic mutations in parkin and improves mitochondrial integ- 
rity in parkin- or PINK1-deficient flies. Knockdown of USP30 in dopaminergic neurons protects flies against paraquat 
toxicity in vivo, ameliorating defects in dopamine levels, motor function and organismal survival. Thus USP30 inhibition is 
potentially beneficial for Parkinson’s disease by promoting mitochondrial clearance and quality control. 


Mitophagy, a specialized autophagy pathway that mediates the clear- 
ance of damaged mitochondria by lysosomes, is important for mito- 
chondrial quality control’. Defective mitochondria, ifleft uncleared, can 
be a source of oxidative stress and compromise the health of the entire 
mitochondrial network. 

Parkinson’s disease is characterized prominently, but not solely, by 
loss of dopaminergic neurons in the substantia nigra. Although the patho- 
genic mechanisms of Parkinson’s disease are unclear, several lines of 
evidence suggest that mitochondrial dysfunction is central to the disease’. 
Most compellingly, familial Parkinson’s disease can be caused by muta- 
tions in the ubiquitin ligase parkin and protein kinase PINK1*“, both of 
which maintain healthy mitochondria via regulating mitochondrial 
dynamics and quality control’. Genetic studies established that PINK1 
acts upstream of parkin®’. PINK] recruits parkin from the cytoplasm 
to the surface of damaged mitochondria, leading to parkin-mediated 
ubiquitination of mitochondrial outer membrane proteins and removal 
of damaged mitochondria by mitophagy* ’°. Parkinson’s disease-associated 
mutations in PINK1 or parkin impair parkin recruitment, mitochon- 
drial ubiquitination, and/or mitophagy*”””’. In the context of the inher- 
ently high mitochondrial oxidative stress in substantia nigra dopamine 
neurons”, loss of parkin-mediated mitophagy could explain the greater 
susceptibility of substantia nigra neurons to neurodegeneration. Thus, 
promoting mitophagy and enhancing mitochondrial quality control 
could benefit dopaminergic neurons. To this end, we performed a screen 
for deubiquitinase enzymes (DUBs) that function in opposition to 
parkin and identified USP30, a mitochondria-localized DUB, as an antag- 
onist of parkin-mediated mitophagy. 


USP30 antagonizes mitophagy 

We screened a Flag-tagged human DUB complementary DNA library 
in a well-established mitochondrial degradation assay’: in cultured cells 
overexpressing parkin, mitochondria depolarization induced by the 


protonophore carbonyl cyanide 3-chlorophenylhydrazone (CCCP) 
results in loss of mitochondria (measured by immunostaining for mito- 
chondrial outer membrane protein TOM20, also known as TOMM20). 
CCCP caused a robust disappearance of TOM20 staining in more than 
80% of cells transfected with green fluorescent protein (GFP)-conjugated 
parkin (Fig. 1a). By testing individual cDNAs from a library of about 
100 different DUBs, only two DUBs (USP30 and DUBA2; also known 
as OTUD6A) robustly blocked this loss of TOM20 staining (Fig. 1a). 
We focused on USP30 because it was reported to be localized in the 
mitochondrial outer membrane with its enzymatic domain putatively 
facing the cytoplasm“. We confirmed specific mitochondrial association 
of USP30 (Extended Data Fig. 1a—c). Thus, USP30 is in the right subcel- 
lular compartment to counteract the action of parkin on mitochondria. 

The ability of USP30 overexpression to prevent CCCP-induced mito- 
phagy was further confirmed in dopaminergic SH-SYSY cells transfected 
with MYC-parkin (Fig. 1b). In addition to TOM20, USP30 overexpres- 
sion also blocked the CCCP-induced loss of HSP60 (a mitochondrial 
matrix protein, also known as HSPD1), indicating that USP30 antag- 
onizes en masse degradation of mitochondria (Fig. 1b-d). A catalytically 
inactive USP30(C77S) mutant"* was ineffective at preventing parkin- 
mediated mitochondrial degradation, indicating that USP30 counteracts 
mitophagy through deubiquitination (Fig. 1b-d). Consistently, USP30 
overexpression reduced accumulation of ubiquitin signal on mitochon- 
dria in CCCP-treated GFP-parkin-expressing cells, dependent on USP30 
catalytic activity (Extended Data Fig. 2a, b). USP30 co-expressed with 
parkin also reduced CCCP-induced recruitment of autophagy markers 
p62 and LC3-GFP’*”’ to parkin-associated mitochondria (Extended 
Data Fig. 2c-f). Co-expression of USP30 did not alter parkin express- 
ion level or the translocation of parkin to mitochondria (Extended Data 
Fig. 1d, e; Fig. 1b, e). These data indicate that USP30 functions asa DUB 
that opposes ubiquitination of mitochondrial proteins by parkin, thereby 
inhibiting mitophagy. 
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5Department of Non-Clinical Biostatistics, Genentech, Inc., South San Francisco, California 94080, USA. 


*These authors contributed equally to this work. 


370 | NATURE | VOL 510 | 19 JUNE 2014 


©2014 Macmillan Publishers Limited. All rights reserved 


a Transfection 
CCCP 24h GFP-parkin 
Control (B-Gal) _|___USP30-Flag__|__ DUBA2-Flag 
Parkin 
TOM20 
Pe Flag 
9 
TOM20 
c Control usP30 qd Control USP30 @ 
(B-Gal) Miuspso (C778) (B-Gal) Miusrso (C778) 100 
e100) . of i bs oS 80 
of oe £¢ 
SP 6 { e a 8 S 2 60 
ona I ors co 
2 = eo Sto 20 85 
5b £°2 oc 40 
c=] COG 2s 
28 407 oaAG Oo < 
oo o= 8 10 ok Og 20 
O92 204 rog a 
ot : ES j a 
5 0 Zi = 0 | 0 


TOM20 HSP60 


TOM20 HSP60 


Figure 1 | USP30 antagonizes parkin-mediated mitophagy. 

a, b, Immunostaining of HeLa (a) or SH-SY5Y (b) cells transfected as indicated 
and treated with CCCP (20 pM, 24h). Scale bars, 10 um. c-e, Quantification of 
percent of cells with TOM20 or HSP60 staining (c), fold change in TOM20 or 


PINK1, parkin required for mitophagy 


To measure mitophagy in neurons, we monitored mt-Keima, a ratio- 
metric pH-sensitive fluorescent protein that is targeted into the mito- 
chondrial matrix. A low-ratio mt-Keima-derived fluorescence (543 nm/ 
458 nm) reports neutral environment, whereas a high-ratio fluorescence 
reports acidic pH’. Thus, mt-Keima enables differential imaging of 
mitochondria in the cytoplasm and mitochondria in acidic lysosomes. 
Because mt-Keima is resistant to lysosomal proteases’®, it allows for mea- 
surement of cumulative lysosomal delivery of mitochondria over time. 

In cultured rat hippocampal neurons, mt-Keima signal accumu- 
lated in elongated structures characteristic of mitochondria with low 
543 nm/458 nm ratio values (Extended Data Fig. 3a, shown in green), 
and in multiple round structures throughout the cell body with high 
ratio (acidic) signal (Extended Data Fig. 3a, red). We confirmed these 
round mt-Keima-positive structures are most likely to be lysosomes, 
as previously described (Extended Data Fig. 3b-d)'*. Since almost all 
of the ‘acidic’ mt-Keima signal was found in neuronal cell bodies, we 
used the ratio of the area of lysosomal (red) signal/mitochondrial (green) 
signal within the cell body as a measure of lysosomal delivery of mito- 
chondria (‘mitophagy index’)’*. As quantified by this mitophagy index, 
the abundance of mt-Keima in lysosomes increased over a time course 
of days (Extended Data Fig. 3e), indicating ongoing mitophagy in cul- 
tured neurons under basal conditions. 

In heterologous cells, parkin overexpression can drive mitochondrial 
degradation upon mitochondria depolarization; however, it is not yet 
established whether endogenous parkin and PINK] are required for 
mitophagy in either non-neural or neural cells’. To examine their roles 
in neuronal mitophagy, we knocked down parkin or PINK] in neuronal 
cultures using small hairpin RNAs (shRNAs) (Extended Data Fig. 3f-i). 
Compared to control luciferase shRNA, parkin or PINK] shRNAs (either 
of two independent sequences) reduced mitochondria delivery to lyso- 
somes (Fig. 2a—d). Consistent with the genetic epistasis in flies*”, PINK] 
overexpression enhanced mitophagy in neurons, an effect that was com- 
pletely eliminated by parkin knockdown (Extended Data Fig. 3), k).On 
the other hand, parkin overexpression by itself had no apparent effect 
on basal mitophagy, as measured by the mt-Keima assay (Extended Data 
Fig. 31, m). Thus, neuronal mitophagy requires both PINK] and par- 
kin, with PINK1—apparently the limiting factor—acting upstream of 
parkin. 


USP30 antagonizes mitophagy in neurons 


Does USP30 suppress mitophagy in neurons? In the mt-Keima assay, 
USP30 overexpression caused a ~70% reduction in mitophagy index, 
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indicating that USP30 inhibits lysosomal delivery of mitochondria in 
neurons (Fig. 2e, f). In contrast, overexpression of enzymatically inact- 
ive USP30 (C77S or C77A) induced a robust increase in mitophagy 
(Fig. 2e, f), probably reflecting a dominant-negative action of catalyti- 
cally inactive USP30. To test the function of endogenous USP30, we 
knocked down USP30 using shRNAs (Fig. 2g, h). In neurons, the rat 
USP30 shRNA induced a modest increase in the area of individual mito- 
chondria in dendrites (Extended Data Fig. 3n, 0). More importantly, 
USP30 knockdown enhanced mitophagy (~60% increase in mitophagy 
index) (Fig. 2i, j). This effect was ‘rescued’ by co-transfection of shRNA- 
resistant human USP30 cDNA, indicating that USP30 shRNA was not 
exerting a non-specific effect (Fig. 2g, i,j). In fact, neurons co-transfected 
with human USP30 cDNA plus rat USP30 shRNA showed lower levels 
of lysosomal accumulation of mt-Keima than controls, similar to neurons 
overexpressing wild-type USP30 by itself (Fig. 2e, f). Moreover, human 
USP30(C77S) mutant failed to reverse the enhanced mitophagy induced 
by USP30 shRNA, and actually enhanced mitophagic activity even more 
than USP30 shRNA (Fig. 2i, j), the latter result suggesting that USP30 
knockdown is incomplete. In HEK-293 cells, autophagic flux (as mea- 
sured by dynamic levels of LC3-II and p62) was inhibited by USP30 over- 
expression, dependent on enzymatic activity, and enhanced by USP30 
knockdown (Extended Data Fig. 4, see Supplementary Results). Togeth- 
er, these results show that endogenous USP30 restrains mitophagy in 
cells through its DUB activity. 


USP30 deubiquitinates mitochondrial proteins 


Using mass spectrometry (MS) analysis following immunoaffinity 
enrichment of ubiquitinated peptides with ubiquitin branch-specific 
(K-GG) antibodies”, we identified 41 proteins whose ubiquitination 
was oppositely regulated by parkin and USP30 (see Supplementary 
Results, Supplementary Tables 1-3 and Extended Data Fig. 5a). We 
focused on two such mitochondrial proteins, TOM20 and MIRO!1 (also 
known as RHOT1), that showed large increases in ubiquitination with 
USP30 knockdown and parkin overexpression. To confirm that USP30 
can deubiquitinate these proteins, cell lines stably overexpressing GFP- 
parkin were transfected with haemagglutinin (HA)-conjugated ubiquitin, 
and ubiquitinated proteins were immunoprecipitated using anti-HA anti- 
bodies. Following mitochondrial depolarization with CCCP, GFP-parkin 
stable cells showed enhanced ubiquitination of endogenous MIRO1 and 
TOM20, as measured by immunoblotting for these proteins in the anti- 
HA immunoprecipitates (Fig. 3a—c). Cotransfection of wild-type USP30, 
but not DUB-dead USP30(C77S), strongly decreased the amount of ubi- 
quitinated MIRO1 and TOM20 induced by CCCP (Fig. 3a—c). Consistent 
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Figure 2 | USP30 antagonizes mitophagy in neurons. a, ¢, e, i, mt-Keima 
imaging in neurons transfected as indicated. Human USP30 cDNA is 
insensitive to rat USP30 shRNA in i. b, d, f, j, Quantification of mitophagy 
index from a, ¢, e, i. Kruskal-Wallis test. n = 112-169 (b), 54-80 (d), 80-155 


with a dominant-negative mechanism, USP30(C77S) increased basal 
TOM20 ubiquitination approximately twofold, and CCCP-induced ubi- 
quitination approximately eightfold (Fig. 3a, c). CCCP did not induce 
detectable TOM20 or MIRO1 ubiquitination in the parental HEK-293 
cell line (lacking GFP-parkin) (Extended Data Fig. 5b). In this cell line, 
however, overexpression of USP30(C77S) was still able to enhance basal 
TOM20 ubiquitination and wild-type USP30 to suppress it (Extended 
Data Fig. 5b). USP30 overexpression in GFP-parkin stable cells signifi- 
cantly reduced CCCP-induced degradation of MIRO1 and TOM20, 
dependent on USP30 catalytic activity (Extended Data Fig. 5c, d). Taken 
together, our data indicate that MIRO1 and TOM20 are substrates of 
USP30, and that USP30 can counteract parkin-mediated ubiquitination 
and degradation of both MIRO1 and TOM20 following mitochondrial 
damage. Using the same experimental system (cells overexpressing GFP- 
parkin and HA-ubiquitin), we tested the function of endogenous USP30 
by shRNA knockdown. USP30 knockdown did not affect basal ubiqui- 
tination of MIRO] (in the absence of CCCP) (Fig. 3d). After mitochon- 
drial depolarization, however, and consistent with the MS experiments, 
USP30 knockdown increased the level of ubiquitinated MIRO1 about 
2.5-fold (Fig. 3d, e). Notably, USP30 knockdown increased both basal 
and CCCP-induced TOM20 ubiquitination, similar to enzymatically 
inactive USP30 (Fig. 3d, f). The increase in MIRO1 and TOM20 ubi- 
quitination caused by USP30 shRNA was prevented by coexpression 
of the rat USP30 cDNA (which is insensitive to human USP30 shRNA), 
indicating the specificity of the RNA interference (RNAi) effect (Extended 
Data Fig. 5e). These biochemical data corroborate the MS findings that 
endogenous USP30 acts as a brake on ubiquitination of both MIRO1 
and TOM20. Because USP30 knockdown or expression of enzymati- 
cally inactive USP30 enhances mitophagy (Fig. 2e-j) and ubiquitination 
of TOM20 (Fig. 3), and TOM20 degradation accompanies mitophagy 
(Extended Data Fig. 5c, d), we speculated that TOM20 depletion might 
trigger mitophagy. In this model, 'TOM20 overexpression should block 
mitophagy induced by USP30 knockdown. Instead, we found that over- 
expression of TOM20—even by itself—led to a robust increase in mitophagy 
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(f), and 96-179 (j) cells. n = 5 (b), 4 (d), 8 (f), and 7 (j) experiments. 

g, Immunoblots of HEK-293 cells transfected as indicated. h, Immunoblot of 
endogenous USP30 in rat hippocampal neurons infected with adeno-associated 
virus expressing USP30 shRNA. All scale bars, 5 jim. Error bars, s.e.m. 


in the mt-Keima assay (Extended Data Fig. 6c), an effect similar to USP30 
knockdown. We therefore proposed that it is the ubiquitination of 
TOM20, rather than its degradation, that serves as the signal for mito- 
phagy, and that overexpression of TOM20 promotes mitophagy by 
increasing the pool of substrates available for ubiquitination. Consistent 
with this hypothesis, overexpression of TOM20-3KR (TOM20(K56R, 
K61R, K68R) with lysine-to-arginine mutations at three ubiquitination 
sites regulated by CCCP and USP30) failed to enhance mitophagy (Ex- 
tended Data Fig. 6a—d). Thus TOM20 is sufficient to drive mitophagy, 
but this ability depends on its ubiquitination. Moreover, TOM20-3KR 
blocked the increase in mitophagy induced by USP30(C77S) (Extended 
Data Fig. 6c, d), implying that the increased mitophagic flux caused by 
dominant-negative USP30 requires TOM20 ubiquitination. Alterna- 
tively, overexpressed TOM20-3KR may oppose USP30(C77S)-induced 
mitophagy by physically associating with USP30 in a non-catalytic man- 
ner. Taken together, these data suggest that ubiquitination of TOM20 
can promote mitophagy in neurons, and that inhibition of mitophagy 
by USP30 can be explained at least in part by TOM20 deubiquitination. 


USP30 is a parkin substrate 


Since parkin acts on proteins of the outer mitochondrial membrane 
and since USP30 resides at this location, we wondered whether USP30 
is itself a substrate of parkin. Supporting this possibility, we identified 
USP30-derived ubiquitinated peptides in MS experiments in GFP-parkin 
cells treated with CCCP (Extended Data Fig. 6e; Supplementary Table 1). 
We confirmed that parkin can ubiquitinate endogenous USP30 follow- 
ing CCCP treatment (Extended Data Fig. 6f, g). CCCP also induced a 
significant drop in USP30 levels in GFP-parkin cells (Extended Data 
Fig. 6h, i). USP30 degradation was inhibited by MG132 and epoxomicin, 
but not bafilomycin, suggesting that USP30 is degraded by proteasomes 
(Extended Data Fig. 6j, k). Importantly, parkin with pathogenic G430D 
or K161N mutations was unable to ubiquitinate or degrade USP30 (Ex- 
tended Data Fig. 6f-i). These results suggest that parkin ubiquitination 
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Figure 3 | USP30 and parkin act antagonistically on common substrates. 
a, d, Immunoblotting for MIRO1 and TOM20 in anti-HA immunoprecipitates 
from GFP-parkin HEK-293 cells transfected as indicated. b, c, Quantification 
of levels of ubiquitinated MIRO1 (b) and TOM20 (c) from a. One-way 
ANOVA, Bonferroni’s test. n = 6 (b) and 5 (c) experiments. e, f, Quantification 
of ubiquitinated MIRO1 (e) and TOM20 (f) from d. One-way ANOVA, 
Dunnett’s test. n = 4 (e) and 6 (f) experiments. Error bars, s.e.m. 


and degradation of USP30 might contribute to mitophagy by remov- 
ing the brake on mitophagy. 


USP30 knockdown rescues mitophagy defects 

To test whether suppressing USP30 could rescue impaired mitochon- 
drial degradation associated with parkin pathogenic mutations, we fo- 
cused on two Parkinson’s disease-linked parkin mutants, G430D and 
K161N, which display defects in mitophagy*”’. In SH-SY5Y cells trans- 
fected with GFP-parkin(G430D) and treated with CCCP, mitochondria 
failed to be cleared and instead formed perinuclear clusters in asso- 
ciation with the defective parkin protein (Fig. 4a). The same cells dou- 
bly transfected with GFP-parkin(G430D) and USP30 siRNA, which 
led toa knockdown of USP30 protein by ~60% (Extended Data Fig. 7a), 
showed a ~70% decrease in mitochondria (as measured by total TOM20 
or HSP60 immunofluorescence) compared to cells transfected with GFP- 
parkin(G430D) and control siRNA (Fig. 4a, b). Mitochondrial degra- 
dation was not rescued by knockdown of other DUBs (USP6, USP 14) 
(Extended Data Fig. 7b-d). Rescue of mitochondrial degradation was 
correlated with loss of perinuclear clusters of mutant parkin(G430D) 
(usually associated with mitochondria) and the appearance of smaller 
parkin-containing puncta throughout the cytoplasm (Fig. 4a, Extended 
Data Fig. 7b, c, e). Re-introduction of RNAi-resistant wild-type rat 
USP30 cDNA, but not the inactive rat USP30(C77S) mutant, prevented 
the rescue of mitochondrial degradation by USP30 siRNA (Extended 
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Figure 4 | USP30 knockdown rescues mitophagy defects associated 

with mutant parkin. a, Immunostaining of SH-SY5Y cells expressing GFP- 
parkin(G430D), transfected with USP30 siRNA and treated with CCCP 

(20 uM, 24h). b, Quantification of fold change in TOM20 and HSP60 
fluorescence intensity from a. Kruskal-Wallis test, n = 6 experiments. 

c, mt-Keima imaging in neurons transfected with parkin shRNA and 
USP30(C77A)-Flag. d, Quantification of mitophagy index from c. Kruskal- 
Wallis test, n = 71-77 cells; n = 3 experiments. All scale bars, 5 jum. 

Error bars, s.e.m. 


Data Fig. 7e, f). The mitochondrial degradation defect associated with 
parkin(K161N) mutant was similarly rescued with USP30 siRNA knock- 
down (Extended Data Fig. 7g, h). USP30 knockdown also increased 
levels of p62 and LC3 staining colocalizing with GFP—parkin(K161N) 
in CCCP-treated cells (Extended Data Fig. 7i-l). In neurons, reduced 
mitophagy due to parkin or PINK1 knockdown (as measured in the mt- 
Keima assay) was likewise rescued by dominant-negative USP30(C77A) 
(Fig. 4c, d; Extended Data Fig. 7m, n). Thus, suppressing the expression 
or activity of USP30 allows cells to overcome parkin deficiency and 
restore clearance of damaged mitochondria. 


USP30 knockdown protects in vivo 

What are the functional effects of USP30 knockdown? Reactive oxygen 
species largely derive from mitochondria, and mitochondria dysfunc- 
tion may contribute to increased oxidative stress in Parkinson’s disease’. 
USP30 knockdown in neuronal cultures reduced basal mitochondrial 
oxidation signal measured by ratiometric imaging with mitochondrial 
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Figure 5 | USP30 knockdown provides protection in vivo. a, Transverse 
sections of Drosophila indirect flight muscles of indicated genotypes. 
Arrowheads, electron-dense mitochondria; dashed lines, ‘pale’ mitochondria 
with disorganized cristae. Scale bars, 1 [um (top), 0.2 zm (bottom panels). 

b, c, Quantification of mitochondrial morphology (b) and size distribution 
(c) from a. One-way ANOVA, Bonferroni's test (b). Kolmogorov—Smirnov test, 
park” versus ‘park” + dUSP30 knockdown’ (c). n = 3-4 flies per genotype. 


redox potential sensor mito-roGFP’’ (Extended Data Fig. 8a, b; see 
Methods), indicating USP30 suppression can ameliorate basal mito- 
chondrial oxidative stress. To test whether knocking down USP30 would 
provide protection under stress conditions in vivo, we used Drosophila, 
a model system for studying Parkinson’s disease molecular pathogen- 
esis’’. To knockdown fly USP30 (CG3016, hereafter called dUSP30), 
we used the GAL4/UAS system”. We crossed an Actin-GAL4 driver toa 
UAS-dUSP30"™' transgenic line, which allows widespread expression 
of dUSP30 RNAi driven by the actin promoter (referred to as ‘dUSP30 
knockdown’). Activation of UAS-dUSP30°™' by Actin-GAL4 led to an 
approximately 90% reduction of dUSP30 mRNA, compared to the 
parental lines (Extended Data Fig. 8c). By crossing the “dUSP30 knock- 
down’ line with parkin (park”)® or pink1 (pink1°’)’ mutant flies, we 
showed that dUSP30 knockdown in these mutant flies largely restored 
mitochondrial morphology defects (disorganized cristae and enlarged 
size) in their indirect flight muscles’”* (Fig. 5a—c, Extended Data Fig. 8d-f). 
dUSP30 knockdown by itself did not affect mitochondrial morphology 
(Extended Data Fig. 8g). Thus, USP30 suppression can maintain mito- 
chondrial health in the face of parkin or pink! loss-of-function in vivo. 

Unlike previous studies**’’, we did not observe any dopaminergic 
neurodegeneration or dopamine depletion in parkin mutant Drosophila 
(Extended Data Fig. 9a-g); therefore we could not test the effect of 
dUSP30 knockdown on these phenotypes. As reported previously’, pink1 
mutant flies showed poor performance in a climbing assay and depletion 
of the neurotransmitter dopamine in their brain (Fig. 5d, e). dUSP30 
knockdown driven by Actin-GAL4 ameliorated the climbing defect and 
prevented the dopamine depletion (Fig. 5d, e), thus benefitting pink1 
mutant flies in both behavioural and neurochemical terms. 
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d, e, Effect of (USP30 knockdown on climbing (d) and dopamine levels (e) in 
pink1? flies. Kruskal-Wallis test, n = 13 (d) and 8 (e) experiments. f-j, Effect 
of dUSP30 knockdown, paraquat and human USP30 overexpression on 
climbing (f, i) and survival (g, h, j) in wild-type flies. Human USP30 is 
insensitive to dUSP30-RNAi in i and j. Kruskal-Wallis test (f, i), two-way 
ANOVA (g, h, j), n = 4-8 (f), 3-8 (g, h), 4 (i), and 3 (j) experiments. 

Error bars, s.e.m. 


To examine the effect of suppressing USP30 in neurons directly rel- 
evant to Parkinson’s disease, we first used dopamine decarboxylase (Ddc)- 
GAL4* to drive UAS-dUSP30"™™ specifically in dopamine neurons and 
other aminergic neurons. As a model of Parkinson’s disease, we treated 
flies with paraquat, a mitochondrial toxin linked to Parkinson’s disease*””. 
Following treatment with paraquat, the Ddc-GAL4 and UAS-dUSP30°\' 
parental fly lines showed reduced climbing performance (Fig. 5f). This 
behavioural deficit is related to dopamine depletion, as the defect was 
fully rescued by treatment with L-3,4-dihydroxyphenylalanine (L-DOPA) 
(Extended Data Fig. 9h). Paraquat treatment caused a 30-60% reduc- 
tion in dopamine levels in fly heads without altering serotonin levels 
(Extended Data Fig. 9i, 1). Knocking down dUSP30 using Ddc-GAL4 com- 
pletely rescued the paraquat-induced climbing impairment (Fig. 5f, Sup- 
plementary Video 1). Restoration of climbing function was also observed 
using a different dopaminergic neuron driver (Th-GAL4) or ‘whole-body’ 
knockdown of USP30 (Actin-GAL4; Extended Data Fig. 9j, k). Strikingly, 
USP30 knockdown using Ddc-GAL4 or Th-GAL4 drivers also prevented 
paraquat-induced dopamine depletion (Extended Data Fig. 91,m). We 
confirmed that the various fly lines ingested similar amounts of para- 
quat as measured by liquid chromatography-mass spectrometry (see 
Methods). Paraquat treatment by itself did not cause obvious changes 
in mitochondrial morphology in fly muscle (Extended Data Fig. 8g) or 
brain (data not shown). These results show that suppression of USP30 
can benefit dopaminergic neurons and motor behaviour in the face of 
mitochondrial toxicity associated with Parkinson’s disease. 

We tested the effect of USP30 knockdown on the survival of flies fed 
with paraquat. Flies expressing dUSP30 RNAi in the whole body lived 
significantly longer than controls (Fig. 5g). Knockdown of other DUBs 
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in flies (USP47, also known as Ubp64E, or dYOD1, also known as 
CG4603) did not improve survival; if anything, they exacerbated the 
rate of death in response to paraquat (Extended Data Fig. 10a—c). Re- 
markably, USP30 knockdown driven by Ddc-GAL4 was sufficient to 
provide significant survival benefit, albeit less than whole-body USP30 
knockdown (Fig. 5h). Similar results were obtained using Th-GAL4 
(Extended Data Fig. 10d). As these results rely on only a single (USP30 
RNAi fly line, we confirmed RNAi specificity by introducing RNAi- 
resistant human USP30 cDNA into flies expressing dUSP30-RNAi 
(Extended Data Fig. 10e, f). Human USP30 cDNA (driven by either 
by Ddc-GAL4 or Actin-GAL4) reversed the benefit provided by dUSP30- 
RNAi in paraquat-treated flies in both the climbing and survival assays 
(Fig. 5i, j; Extended Data Fig. 10g, h). These results imply that a signifi- 
cant portion of the organismal benefit of USP30 suppression is mediated 
in dopaminergic neurons, and it further reinforces the idea that USP30 
can play a critical role in dopaminergic neuron dysfunction. 

Since suppression of USP30 restored mitochondrial integrity in par- 
kin and pink1 mutant flies and functionally protected dopaminergic 
neurons against the mitochondrial toxin paraquat, our findings pro- 
vide in vivo evidence that inhibition of USP30 might be helpful in dis- 
eases caused by mitochondrial damage and dysfunction. 


METHODS SUMMARY 


Statistical tests and one-way ANOVA post-hoc tests are indicated in figure legends. 
For multiple comparison analysis, Dunn’s and Bonferroni’s post-hoc tests were 
used for all Kruskal-Wallis and two-way ANOVA tests, respectively. P values are 
represented as *P < 0.05, **P < 0.01 and ***P < 0.001. For details of experimental 
methods and statistical analysis, see Methods. 
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METHODS 


DNA construction. For the DUB overexpression screen, a Flag-tagged DUB library 
consisting of 100 cDNAs was used. For transfection, the following constructs were 
subcloned into the B-actin promoter-based pCAGGS plasmid: USP30-Flag (rat), 
USP30-Flag (human), GFP-parkin (human), MYC-parkin (human), Flag-parkin 
(rat), red fluorescent protein (RFP)—parkin (human), PINK1-GFP (human), TOM20- 
MYC (human), LC3-GFP (human), HA-ubiquitin, PSD95-Flag, GFP, mito-GFP, 
and mt-Keima'*. Point mutations were generated using QuikChange II XL (Agilent 
Technologies) for the following constructs: USP30(C77S)-Flag (rat), USP30(C77A)- 
Flag (rat), USP30(C77S)-Flag (human), GFP-parkin(K161N) (human), GFP-parkin 
(G430D) (human). Mito-tagGFP2 (Evrogen) and TOM20-3KR-MYC (Blue Heron) 
were purchased. B-Gal’? and mito-roGFP”! expression plasmids were previously 
described. Short-hairpin sequences targeting the following regions were cloned 
into pSuper or pSuper-GFP-neo plasmids: rat PINK1 1 (TCAGGAGATCCAGG 
CAATT), rat PINK1 2 (CCAGTACCTTGAAGAGCAA), rat parkin 1 (GGAAGT 
GGTTGCTAAGCGA), rat parkin 2 (GAGGAAAAGTCACGAAACA), rat USP30 
(CCAGAGCCCTGTTCGGTTT), human USP30 (CCAGAGTCCTGTTCGATTT), 
and firefly luciferase (CGTACGCGGAATACTTCGA). 

Antibodies and reagents. The following antibodies were used for immunocyto- 
chemistry: rabbit anti-TOM20” (clone no. FL-145; catalogue no. sc11415), mouse 
anti-TOM20 (clone no. F-10; catalogue no. sc17764), goat anti- HSP60”° (catalogue 
no. sc1052) (Santa Cruz Biotechnology); mouse anti-Flag-M2 (catalogue no. F1804), 
rabbit anti-Flag (catalogue no. F7425), mouse anti- MYC (catalogue no. M4439) 
(Sigma-Aldrich); chicken anti-GFP (catalogue no. A10262) (Invitrogen); mouse 
anti-p62* (catalogue no. BDB610833) (BD Biosciences); rabbit anti-LC3* (cata- 
logue no. NB100-2220) (Novus Biologicals); and mouse anti-FK2" (catalogue no. 
BML-PW$8810) (Enzo Life Sciences); mouse anti-LAMP1 (clone no. Ly1C6; cata- 
logue no. ADI-VAM-EN001-D) (StressGen). 

The following antibodies were used for immunoblotting: rabbit anti-TOM20° 
(clone no. FL-145; catalogue no. sc11415), goat anti- HSP60"° (catalogue no. sc1052) 
(Santa Cruz Biotechnology); mouse anti-Mfn1* (clone no. 3C9; catalogue no. 
WH0055669M4), HRP-conjugated anti-Flag (catalogue no. A8592), mouse anti- 
MYC (catalogue no. M4439), rabbit anti- USP30 (catalogue no. HPA016952), rabbit 
anti-RHOT1 (MIRO1) (catalogue no. HPA010687), rabbit anti- TIMM8a (clone 
no. 2F11; catalogue no. WH0001678M1) (Sigma-Aldrich); rabbit anti-GFP (cata- 
logue no. A11122), chicken anti-GFP (catalogue no. A10262) (Invitrogen); HRP- 
conjugated anti-GAPDH (clone no. 14C10; catalogue no. (3683), HRP-conjugated 
anti-B-actin (clone no. 13E5; catalogue no. 5125), HRP-conjugated anti-B-tubulin 
(clone no. 9F3; catalogue no. 5346), rabbit anti- VDAC (clone no. D73D12; cata- 
logue no. 4661), mouse anti-parkin (clone no. Prk8; catalogue no. 4211) (Cell Signal- 
ing Technology); rabbit anti-Tom70 (catalogue no. 14528-1-AP) (Proteintech Group); 
mouse anti-ubiquitin” (clone no. FK2; catalogue no. PW8810) (Enzo Life Sciences); 
HRP-conjugated anti-HA (clone no. 3F10; catalogue no. 12013819001) (Roche); 
mouse anti-p62” (catalogue no. BDB610833) (BD Biosciences); rabbit anti-PINK1 
(catalogue no. BC100-494), rabbit anti-LC3** (catalogue no. NB100-2220) (Novus 
Bio); and rabbit anti-USP30 (generated by immunizing rabbits with purified 
human USP30 amino acids 65-517). 

Anti-HA affinity matrix beads (Roche Applied Science) were used for immu- 
noprecipitation experiments. Adeno-associated virus type2 (AAV2) particles expres- 
sing parkin, PINK1 and USP30 shRNAs were produced by Vector Biolabs, Inc. using 
the pAAV-BASIC-CAGeGFP-WPRE vector containing an H1 promoter and the 
same shRNA expression cassette as the pSuper vectors. 

The following reagents were purchased as indicated: blasticidin S, zeocin, Lipo- 
fectamine 2000, Lipofectamine LTX PLUS, LysoTracker Green DND-626 (Invitrogen); 
PhosSTOP phosphatase inhibitor tablets, COmplete EDT A-free protease inhibitor 
tablets, DNase I (Roche Applied Science); carbonyl cyanide 3-chlorophenylhydrazone 
(CCCP), doxycycline, dimethyl sulphoxide, ammonium chloride, rotenone, DTT, 
aldrithiol, paraquat dichloride, MG132, epoxomicin (Sigma-Aldrich); bafilomycin 
(Cayman Chemical); L-3,4-dihydroxyphenylalanine (Fluka Analytical); N-ethylmaleimide 
(Thermo Scientific); and hygromycin (Clontech Laboratories). 

Transfection and immunocytochemistry. All heterologous cells were transfected 
with Lipofectamine 2000 for cDNA expression and Lipofectamine RNAiMAX for 
siRNA knockdown experiments, according to manufacturer’s instructions (Invi- 
trogen). siRNAs were purchased from Dharmacon as siGenome pools (non-Silencing 
pool 2 was used as control siRNA transfection). Hippocampal cultures were prepared 
as described previously” and transfected with Lipofectamine LTX PLUS (Invitrogen) 
with 1.8 jig DNA, 1.8 jl PLUS reagent and 6.3 11 LTX reagent. Following drug treatments, 
cells were fixed with 4% paraformaldehyde/4% sucrose in phosphate-buffered 
saline (PBS, pH 7.4) (Electron Microscopy Sciences). Following permeabilization 
(0.1% Triton-X in PBS), blocking (2% BSA in PBS) and primary antibody incuba- 
tion, antibodies were visualized using Alexa dye-conjugated secondary antibodies 
(Invitrogen). All immunocytochemistry images were acquired with a Leica SP5 


laser scanning microscope with a X40/1.25 oil objective (0.34 um per pixel reso- 
lution, 1 jum confocal z-step size). 

HEK-293 and SH-SY5Y stable cell line generation. Stably transfected HEK-293 
cell lines expressing GFP-parkin (human) wild type, K161N, and G430D were 
generated by co-transfecting FLP-In 293 cells with a pOG44 Flp-recombinase ex- 
pression vector (Invitrogen) and a pcDNA5-FRT vector (Invitrogen) expressing 
the corresponding constructs under a CMV promoter. Cell lines were selected 
and maintained using 50 g ml’ hygromycin selection. Inducible HEK stable cell 
line expressing GFP-parkin (human) was generated by co-transfecting FLP-In 
T-Rex 293 cells with pOG44 and a pcDNA5-FRT-TO vector (Invitrogen) expres- 
sing GFP-parkin (human). This line was selected and maintained using 50 jig ml" 
hygromycin and 15 pig ml * blasticidin. SH-SY5Y stable cells were generated simi- 
larly with a Flp-In inducible parental cell line using pCDNA5-FRT-TO and main- 
tained under 75 ug ml’ hygromycin and 3 ug ml * blasticidin. Cell lines were 
quality controlled by STR analysis and by testing for mycoplasma contamination. 
Isolation and identification of ubiquitin modifications by mass spectrometry. 
To identify parkin substrates, HEK-293 cells stably expressing inducible GFP- 
parkin (human) were induced using doxycycline (1 pg ml~ ) for 24h, then treated 
with 5 uM CCCP or DMSO vehicle control for 2 h. To determine USP30 substrates, 
HEK-293 cells were transfected with human USP30 shRNA using Lipofectamine 
2000 (Invitrogen) for 6 days, then treated as above. 

Immunoaffinity isolation and mass spectrometry methods were used to enrich 
and identify K-GG peptides from digested protein lysates as previously described**”. 
Cell lysates were prepared in lysis buffer (8 M urea, 20 mM HEPES pH 8.0, 1 mM 
sodium orthovanadate, 2.5 mM sodium pyrophosphate, 1 mM B-glycerophosphate) 
by brief sonication on ice. Protein samples (60 mg) were reduced at 60 °C for 20 min 
in 4.1mM DTT, cooled 10 min on ice, and alkylated with 9.1 mM iodoacetamide for 
15 min at room temperature in the dark. Samples were diluted 4X using 20 mM 
HEPES pH 8.0 and digested in 10 pg ml" trypsin overnight at room temperature. 
Following digestion, TFA was added to a final concentration of 1% to acidify the 
peptides before desalting on a Sep-Pak Cj cartridge (Waters). Peptides were eluted 
from the cartridge in 40% ACN/ 0.1% TFA, flash frozen and lyophilized for 48 h. 
Dry peptides were gently resuspended in 1.4ml 1X IAP buffer (Cell Signaling 
Technology) and cleared by centrifugation for 5 min at 1,800g. Precoupled anti- 
KGG beads (Cell Signaling Technology) were washed in 1X IAP buffer before 
contacting the digested peptides. 

Immunoaffinity enrichment was performed for 2 h at 4 °C. Beads were washed 
2X with IAP buffer and 4X with water before 2X elution of peptides in 0.15% 
TEA for 10 min each at room temperature. Immunoaffinity enriched peptides were 
desalted using STAGE-Tips as previously described”*. 

Liquid chromatography-mass spectrometry (LC-MS) analysis was performed 
onan LTQ-Orbitrap Velos mass spectrometer operating in data dependent top 15 
mode. Peptides were injected onto a 0.1 * 100-mm Waters 1.7-j1m BEH-130 C18 
column using a NanoAcquity UPLC and separated at 1 pl min | using a two- 
stage linear gradient where solvent B ramped first from 2% to 25% over 85 min 
and then 25% to 40% over 5 min. Peptides eluting from the column were ionized 
and introduced to the mass spectrometer using an ADVANCE source (Michrom- 
Bruker). In each duty cycle, one full MS scan collected was at 60,000 resolution in 
the Orbitrap followed by up to 15 MS/MS scans in the ion trap on monoisotopic, 
charge state defined precursors (z> 1). Ions selected for MS/MS ( + 20 p.p.m.) 
were subjected to dynamic exclusion for 30s duration. 

Mass spectral data were converted to mzxml for loading into a relational data- 
base. MS/MS spectra were searched using Mascot against a concatenated target- 
decoy database of tryptic peptides from human proteins (Uniprot) and common 
contaminants. Precursor ion mass tolerance was set to + 50 p.p.m. Fixed modi- 
fication of carbamidomethyl cysteine (+57.0214) and variable modifications of 
oxidized methionine (+ 15.9949) and K-GG (+114.0429) were considered. Linear 
discriminant analysis (LDA) was used to filter peptide spectral matches (PSMs) 
from each run to 5% false discovery rate (FDR) at the peptide level, and subsequently 
to a 2% protein level FDR as an aggregate of all runs (<0.5% peptide level FDR). 
Localization scores were generated for each K-GG PSM using a modified version 
of the AScore algorithm and positions of the modifications localized accordingly 
as the AScore sequence”. Given that trypsin cannot cut adjacent to ubiquitin mod- 
ified lysines***°, PSMs where the AScore sequence reports a -GG modification on 
the C-terminal lysine are dubious. Possible exceptions to this would be lysines at 
the C termini of proteins (or in vivo truncation products) and PSMs stemming 
from in source fragmentation of a bona fide K-GG peptide. To establish the most 
reliable data set for downstream analysis, PSMs where the AScore sequence reported 
a C-terminal lysine were split into two groups: those with an available internal lysine 
residue to which the -GG could be alternatively localized, and those which lacked 
an available lysine. PSMs bearing a C-terminal K-GG but lacking an available lysine 
were removed from consideration in downstream analyses. For the remaining PSMs, 
the -GG modification was relocalized to the available lysine closest to the C terminus. 
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Both lists, including the latent hits from the decoy protein sequences, are provided 
as supplementary tables to permit post hoc analysis. 

A modified version of the VistaGrande algorithm, termed XQuant, was used 
to interrogate the unlabelled peak areas for individual K-GG peptides, guided by 
direct PSMs or accurate precursor ion and retention time matching (cross quan- 
tification). For direct PSMs, quantification of the unlabelled peak area was performed 
using fixed mass and retention tolerances, as previously described*’. To enable cross 
quantification within XQuant, retention time correlation across pairwise instru- 
ment analyses was determined based on high-scoring peptide sequences identified 
by between one and four PSMs across all analyses within an experiment. Matched 
retention time pairings were modelled using a linear least squares regression model 
to yield the retention time correlation equation. In instrument runs where a pep- 
tide was not identified by a discrete MS/MS, cross quantification was carried out by 
seeding the XQuant algorithm with the calculated mass of the precursor ion and its 
predicted retention time derived from the regression model. While the m/z tol- 
erance was fixed, the retention time tolerance was dynamically adjusted for each 
pairwise instrument run. In cases where peptides were not confidently identified 
within a given instrument run but were identified in multiple other runs, multiple 
cross quantification events were performed to ensure data quality. XQuant results 
were filtered toa heuristic confidence score of 83 or greater, as previously described”. 
Full scan peak area measurements arising from multiple quantification events of 
the same m/z within a single run were grouped together if their peak boundaries in 
retention time overlapped. From such a group, the peak with the largest total peak 
area was chosen as its single representative. 

To identify candidate substrates of parkin and USP30, graphical analysis and 
mixed-effect modelling were applied to XQuant data. A mixed-effect model was 
fit to the AUC data for each protein. ‘Treatment’ (for example, Control, parkin 
overexpression/USP30 knockdown, CCCP, Combo) was a categorical fixed effect 
and ‘Peptide’ was fit as random effect. False discovery rates (FDR) are calculated 
based on the P values of each treatment vs Control. Fold-changes and P values of 
mean AUC from Combo vs Control and Combo vs CCCP were used in preparing 
‘LiME’ plots. The mixed-effect model was fit in R by ‘nlme™””. 

Preparation of cell lysates and immunoprecipitation. For total lysate experiments, 
cells were lysed after 24h in SDS sample buffer (Invitrogen) containing sample 
reducing agent (Invitrogen) and boiled at 95 °C for 5 min. Total lysates were resolved 
by SDS-PAGE and analysed by immunoblotting. For immunoprecipitation experi- 
ments, cells were treated with the indicated concentrations and durations of CCCP 
at 24h (overexpression experiments) or 6 days (knockdown experiments) post- 
transfection. Then, cells were lysed in 0.5% SDS in Tris-buffered saline (137 mM 
NaCl, 5 mM KCI, 1.5 mM Na,HPO,, 25 mM Tris base, 1 mM CaCl,.2H,0, 0.5 mM 
MgCl,.6H,0, pH 7.5) and boiled at 70 °C for 10 min. Lysates were then diluted in 
immunoprecipitation buffer (50 mM HEPES, 150 mM NaCl, 10% glycerol, 1% Triton- 
X, protease inhibitors (Roche Applied Science), phosphatase inhibitors (Roche 
Applied Science), DNase I (Roche Applied Science), 2 mM N-ethylmaleimide (Thermo 
Scientific), pH 7.4), cleared by centrifugation at 31,000g for 10 min, and incubated 
overnight with anti-HA affinity matrix beads (Roche Applied Science). Inputs and 
anti-HA immunoprecipitates were resolved by SDS-PAGE and analysed by 
immunoblotting. 

Mitochondria fractionation. Subcellular fractionation was performed using the 
FOCUS SubCell Kit (G Biosciences) from ~P60 adult male rat forebrain. 
Drosophila stocks. The following Drosophila lines were obtained for analysis: 
Actin5C-GAL4 (Bloomington Drosophila Stock Center, 4414), Ddc-GAL4*°? 
(Bloomington Drosophila Stock Center, 7010), ple-GAL4 (referred to here as Th- 
GAL4, Bloomington Drosophila Stock Center, 8848), UAS-CG3016°§*! (referred 
to here as UAS-dUSP30"™“‘, NIG-Fly Stock Center, 3016R-2), UAS-CG5486°™*! 
(referred to here as UAS-dUSP47®*“", NIG-Fly Stock Center, 5486R-3), UAS- 
CG4603°N4! (referred to here as UAS-dYOD1 RNAS. NIG-Fly Stock Center, 4603R-2), 
pink1”? (Bloomington Drosophila Stock Center, 34749), park*® (Leo Pallanck, Uni- 
versity of Washington), and park’ (Bloomington Drosophila Stock Center, 34747). 
For USP30 knockdown experiments, ActinSC-GAL4 or Ddc-GAL4 were recom- 
bined onto the same chromosome as UAS-dUSP30"™ using standard genetic tech- 
niques and were balanced over CyO, y+. For experiments using Th-GAL4, flies 
were generated by crossing Th-GAL4 with UAS-dUSP30°™™ and were not balanced. 
The X chromosome of flies in pink1”? experiments contained w; all other lines used 
contained y, w. Wild-type controls used were w or y, w, respectively. 

UAS-hUSP30 and UAS-parkin constructs were generated by PCR amplifying 
human USP30 or parkin cDNA (Origene), respectively, with primers adding restric- 
tion sites EcoRI (5’) and Not! (3’) and subcloning into pUAST-attB (gift of Konrad 
Basler). Parkin R275W and Q311X mutations were made using QuikChange II XL 
(Agilent Technologies). Injections were performed by BestGene, Inc. (Chino Hills, 
CA) for integration into the 86Fb attP landing site (Bloomington 24749). 

Flies were raised on Nutri-Fly “German Food’ Formulation (Genesee, 66-115), 
prepared per manufacturer’s instructions. All flies were raised at 25 °C and crossed 
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using standard genetic techniques. All experiments were performed using age-matched 
male flies. 

Quantitative RT-PCR. RNA and subsequent cDNA was obtained from five adult 
male fly heads following manufacturer’s instructions (Qiagen RNeasy Plus kit, Applied 
Biosystems High Capacity cDNA Reverse Transcription kit). Quantitative RT-PCR 
was performed using an Applied Biosystems ViiA7 Real-Time PCR system using 
TaqMan Assays Dm01796115_g1 and Dm01796116_g1 (Drosophila CG3016 (USP30)), 
Hs00261902_m1 (human USP30), Dm01795269_g1 (Drosophila CG5486 (USP47)), 
and Dm01840115_s1 (Drosophila CG4603 (YOD1)). Dm02134593_g1 (RpII140) 
was used for normalization. 

Transmission electron microscopy of Drosophila indirect flight muscles. Adult 
male thoraxes of indicated ages and genotypes were isolated from the remainder of 
the body, then longitudinally hemi-sectioned and immediately fixed and processed 
as previously described”. Briefly, samples were fixed in modified Karnovsky’s fix- 
ative (2% paraformaldehyde and 2.5% glutaraldehyde in 0.1 M sodium cacodylate 
buffer, pH 7.2), post-fixed in 1% aqueous osmium tetroxide, dehydrated through a 
series of ethanol (50%, 70%, 90%, 95%, 100%) followed by propylene oxide treat- 
ments, and embedded in Eponate 12 (Ted Pella, Redding, CA). Ultrathin sections 
(80 nm) were cut with an Ultracut microtome (Leica), stained with 3.5% aqueous 
uranyl acetate and 0.2% lead citrate and examined in a JEOL JEM-1400 transmis- 
sion electron microscope (TEM) at 120kV. Digital images were captured with a 
GATAN Ultrascan 1000 CCD camera. 

Climbing assays. Flies of indicated ages and genotypes were assayed as below. For 
paraquat-fed experiments, 1-day old adult males were fed a solution containing 
5% sucrose only (in water) or 5% sucrose + 10mM paraquat (in water) or 5% 
sucrose + 10 mM paraquat + 1 mM 1-3,4-dihydroxyphenylalanine (in water) on 
saturated Whatman paper. After 48 h of treatment, flies were anesthetized using 
carbon dioxide and randomly transferred in groups of ten to fresh vials contain- 
ing only 1% agarose (in water) for a 1-h recovery period from the effects of carbon 
dioxide. The flies were then transferred into new glass test tubes, gently tapped to 
the bottom, and scored for their ability to climb. The number of flies climbing 
15cm vertically was recorded at 12s for pink 1”? experiments and 30 s for all other 
experiments; climbing index is calculated as % of flies climbing 15 cm at given time. 
Determination of neurotransmitter levels. Flies of indicated ages and genotypes 
were assayed as below. For paraquat-fed experiments, 1-day old adult males were 
fed a solution containing 5% sucrose only (in water) or 5% sucrose + 10mM 
paraquat (in water) on saturated Whatman paper. After 48 h of treatment, flies were 
anesthetized using carbon dioxide. Single fly heads were dissected off and imme- 
diately placed into 500 jl cold lysis solution on ice. Heads were homogenized using 
a TissueLyser II (Qiagen) with a 3 mm tungsten carbide bead (Qiagen) at a frequency 
of 30 Hz for 3 min. Homogenates were spun down and supernatants used in sub- 
sequent ELISAs. 

For dopamine ELISA, the lysis solution contained 0.01 N hydrochloric acid, 
1mM EDTA, 4mM sodium metabisulphite. ELISA was performed according to 
manufacturer’s instructions (LDN, BA E-5300). For serotonin ELISA, the lysis 
solution was provided in the kit as ‘Diluent’ (LDN, BA E-5900). 
Determination of ingested paraquat concentration. 1-day old adult male flies 
were fed a solution containing 5% sucrose only (in water) or 5% sucrose + 10 mM 
paraquat (in water) on saturated Whatman paper. After 48 h of treatment, 15 flies 
were collected per condition and homogenized in 100 pl water. Standard curve 
samples were generated by spiking appropriate amounts of paraquat to homoge- 
nates from untreated flies. Then the samples were vortex mixed, and 200 pl of ace- 
tonitrile containing internal standard (Propranolol) was added. The samples were 
vortexed again and centrifuged at 10,000g for 10 min. 100 1] of supernatants were 
transferred to a new plate that contained 200 kl of water and analysed by LC-MS/ 
MS to quantify for concentrations of paraquat. The LC-MS/MS consisted of an 
Agilent 1100 series HPLC system (Santa Clara, CA) and an HTS PAL autosampler 
from CTC Analytics (Carrboro, NC) coupled with a 4000 Q TRAP MS and Tur- 
bolonSpray ion source from Applied Biosystems (Foster City, CA). HPLC sepa- 
ration was performed ona Waters Atlantis dC18 column (3 jtm 100 X 2.1 mm) with 
a Krud Katcher guard column from Phenomenex. Quantitation was carried out 
using the multiple reaction monitoring (MRM) with transition 185.1>165.1 for 
paraquat and 260.2—+183.1 for propranolol. The lower and upper limit of the assay 
is 10 1M and 1000 pM, respectively. The quantification of the assay used a calibration 
curve which was constructed through plotting the analyte/IS peak area ration versus 
the nominal concentration of paraquat with a weighed 1/x" linear regression. 

Using this protocol, we confirmed that various fly lines ingested similar amounts 
of paraquat. (average mass of paraquat per fly: UAS-dUSP30°™“*, 3.2 jug, Actin- 
GAL4: 2.7 Wg, Actin-GAL4 >UAS-dUSP30°%“*. 2.7 1g) 

Dopaminergic neuron degeneration assays. Fly brains were dissected as prev- 
iously described*' with the following modifications. Brains were fixed using 4% 
paraformaldehyde in PBS, primary antibody used was rabbit anti-tyrosine hydro- 
xylase (Pel Freez catalogue no. P40101-0) at 1:100, and secondary antibody used 
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was goat anti-rabbit 488 (Invitrogen) at 1:400. Similar results were obtained using 
rabbit anti-tyrosine hydroxylase from Millipore (catalogue no. AB152). Allimages 
were quantified in a blinded manner. For flies of various parkin genotypes, brains 
were dissected at 20 days of age unless otherwise noted. 

Survival assays. Ten adult 1-day old male flies per vial were fed a solution contain- 
ing 5% sucrose only (in water) or 5% sucrose + 10 mM paraquat (in water) on sat- 
urated Whatman paper. The number of live flies was counted at described intervals. 
Data collection and statistics. No statistical methods were used to pre-determine 
sample sizes. Sample sizes were kept similar between experimental groups and rep- 
licates of experiments (for example, ~25-30 neurons per well were imaged for mt- 
Keima experiments from each culture; ~20-50 cells were imaged per well in cell 
line experiments). All experiments and analysis were done in a blind-manner (blind 
to the identity of the experimental groups during image acquisition and analysis; 
blind to the treatment and genotype groups in fly experiments). No data were ex- 
cluded in the analysis. Culture wells were randomly assigned to plasmid DNAs for 
transfection in multi-well plates. Imaging fields were randomly chosen during image 
acquisition. Flies were randomly assigned to treatment groups. 

To compute P values, Mann-Whitney test, Kruskal-Wallis, and one-way 
or two-way ANOVA tests were used. Normal distribution was assessed by the 
Kolmogorov-Smirnov test. Bartlett’s test was used to estimate the variance between 
the groups that are being compared in one-way ANOVA tests. For multiple com- 
parisons, the following post-hoc tests were used: Dunn’s multiple comparison test 
(following Kruskal-Wallis non-parametric tests), Dunnett’s multiple comparison 
test (for comparisons to a single ‘control’ group, following one-way ANOVA tests), 
Bonferroni’s multiple comparison test (for comparisons between multiple condi- 
tions, following one-way or two-way ANOVA tests). P values are represented as 
*P < 0.05, **P < 0.01 and ***P < 0.001. GraphPad v5 was used for the statistics. 
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Extended Data Figure 1 | USP30 is a mitochondrial protein. 

a, Immunostaining of transfected USP30-Flag (red) and mitochondria- 
targeted GFP (green) in cultured rat hippocampal neurons. Merge is shown in 
colour; individual channels in greyscale. Scale bar, 5 um. b, Immunostaining of 
SH-SY5Y cells transfected with control or USP30 siRNA. 3 days after 
transfection, cells were fixed and immunostained for endogenous USP30 and 
HSP60. USP30 siRNA primarily decreases mitochondrial USP30 antibody 
staining (scale bar, 5 ,1m). Higher magnification images of the boxed regions are 


shown in the right panels (scale bar, 2 sm). ¢, Inmunoblots of cytoplasm- and 
mitochondria-enriched fractions from rat brain with USP30, HSP60 and 
GAPDH antibodies. d, Immunoblots of cell lysates from HEK-293 cells stably 
expressing GFP-parkin, transfected with the indicated control (-Gal) and 
USP30 constructs. 24h after transfection, cells were treated with CCCP (5 uM, 
2h) and lysed. e, Quantification of immunoblot signal for GFP—parkin from 
d, normalized to actin. *P < 0.05 by Kruskal-Wallis test and Dunn’s multiple 
comparison test. n = 6 experiments. Error bars represent s.e.m. 
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Extended Data Figure 2 | USP30 counteracts mitochondrial ubiquitination 
and recruitment of p62 and LC3-GFP in CCCP-treated parkin-expressing 
cells. a, Immunostaining of SH-SY5Y cells co-transfected with GFP-parkin 
and the indicated control (B-Gal) and Flag-tagged USP30 constructs. 24h after 
transfection, cells were treated with CCCP (20 iM, 4h) and immunostained for 
GFP, Flag, endogenous TOM20, and polyubiquitin chains (detected with FK2 
antibody). Co-localization of GFP-parkin (shown in red) and polyubiquitin 
(shown in green) is shown in the right panel. Scale bars, 5 um. b, Quantification 
of GFP-parkin-associated polyubiquitin staining intensity from a, normalized 
by GFP-parkin area (integrated fluorescence intensity of FK2 staining 
colocalizing with GFP-parkin/area of GFP-parkin staining). ***P < 0.001 by 
Kruskal-Wallis test and Dunn’s multiple comparison test. n = 6 experiments. 
Error bars represent s.e.m. c, Immunostaining of HeLa cells co-transfected with 
GFP-parkin and the indicated control (f-Gal) and Flag-tagged USP30 
constructs. Cells were treated as in a and immunostained for GFP, Flag, 
endogenous p62, and HSP60. Co-localization of GFP—parkin (shown in red) 
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and p62 (shown in green) is shown in the right panel. Scale bars, 10 jum. 

d, Quantification of GFP—parkin-associated p62 staining intensity from 

c, normalized by GFP-parkin area (integrated fluorescence intensity of p62 
staining colocalizing with GFP-parkin/area of GFP-parkin staining). 

*P < 0.05 by Kruskal-Wallis test and Dunn’s multiple comparison test. n = 5 
experiments. Error bars represent s.e.m. e, Immunostaining of HeLa cells 
co-transfected with RFP-parkin, LC3-GFP and the indicated control (f-Gal) 
and Flag-tagged USP30 constructs. Cells were treated as in a and 
immunostained for GFP, Flag and endogenous HSP60. Co-localization of 
RFP-parkin (shown in red) and LC3-GFP (shown in green) is shown in the 
right panel. Scale bars, 10 um. f, Quantification of RFP-parkin-associated LC3- 
GFP puncta area from e, normalized by RFP-parkin area (area of LC3—GFP 
puncta colocalizing with RFP-parkin/area of RFP-parkin staining). *P < 0.05 
by Kruskal-Wallis test and Dunn’s multiple comparison test. n = 5 
experiments. Error bars represent s.e.m. 
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Extended Data Figure 3 | mt-Keima imaging of mitophagy; PINK1 acts 
upstream of parkin in the mitophagy pathway. a, mt-Keima differentially 
highlights cytoplasmic (green) and lysosomal (red) mitochondria. Cultured 
hippocampal neurons were transfected with mt-Keima and GFP. Following 

2 days of expression, cells were imaged with 458 nm (shown in green) or 

543 nm (shown in red) light excitation. GFP signal was used to outline the 
cell (shown in white). Scale bar, 5 jm. b, mt-Keima imaging in cultured 
hippocampal neurons before and after NH,Cl treatment (50 mM, 2 min). mt- 
Keima signal, collected with 543 nm or 458 nm laser excitation, is shown in red 
and green, respectively. Neutralizing cells with NH,Cl completely reversed the 
high ratio (543 nm/458 nm) signal to low ratio signal specifically in the round 
structures without affecting the tubular-reticular mitochondrial signal. Scale 
bar, 5 um. c, Imaging of mt-Keima and Lysotracker (Lysotracker green 
DND-26 shown in grey scale) in hippocampal neurons, showing Lysotracker 
stained the high ratio mt-Keima structures. Scale bar, 5 jim. d, Post hoc 
immunostaining for endogenous Lamp] in neurons imaged for mt-Keima 
signal, showing the colocalization of high-ratio mt-Keima pixels with Lamp1 
staining. Immediately following mt-Keima imaging, cells were fixed and 
stained with anti-Lamp1 antibody (shown in grey scale). Scale bar, 5 jim. 

e, Quantification of mitophagy index following 1, 3 and 6-7 days of mt-Keima 
expression in cultured hippocampal neurons. **P < 0.01 and ***P < 0.001 by 
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Kruskal-Wallis test and Dunn’s multiple comparison test. n = 29-85 cells. 

n = 2-4 experiments. Error bars represent s.e.m. f, g, Immunoblots of HEK- 
293 cell lysates transfected with the indicated cDNA and parkin (f) or PINK1 
(g) shRNA constructs. PSD-95-Flag was co-transfected as a control. 
Representative blots from three independent experiments are shown. 

h, i, Immunoblots of endogenous parkin (h) and PINK] (i) in cultured 
hippocampal neurons infected with adeno-associated virus expressing the 
indicated shRNAs. Representative blots from two independent experiments are 
shown. j, mt-Keima imaging in neurons transfected with PINK1-GFP and 
parkin-shRNA 1 (B-Gal and luciferase shRNA as controls). Scale bar, 5 tum. 
k, Quantification of mitophagy index from j. ***P < 0.001 by Kruskal-Wallis 
test and Dunn’s multiple comparison test. n = 55-75 cells. n = 3 experiments. 
Error bars represent s.e.m. 1, mt-Keima imaging in neurons transfected with 
GFP-parkin or GFP control. Scale bar, 5 um. m, Quantification of mitophagy 
index from 1. (P = 0.22 by Mann-Whitney test. n = 37-43 cells. n = 3 
experiments. Error bars represent s.e.m. n, Mitochondria-targeted GFP 
(mito-GFP) imaging in neurons transfected with luciferase shRNA or USP30 
shRNA constructs. Scale bar, 10 jum. Higher magnification images shown in the 
bottom panel. Scale bar, 5 jm. 0, Quantification of fold change in area of 
individual dendritic mitochondria from n. ***P < 0.001 by Mann-Whitney 
test. n = 9 experiments. Error bars represent s.e.m. 
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Extended Data Figure 4 | USP30 opposes autophagic flux. a, Immunoblots 
of cell lysates from HEK-293 cells, transfected with GFP-parkin and the 
indicated control ($-Gal) or USP30 constructs. b, Quantification of the LC3-II 
and p62 immunoblot signal from a, normalized to actin. **P < 0.01 by 
Kruskal-Wallis test and Dunn’s multiple comparison test. n = 6 experiments. 
Error bars represent s.e.m. c, Immunoblots of cell lysates from HEK-293 cells, 
transfected with GFP-parkin and B-Gal or USP30 wild type constructs, as 
indicated. 24h after transfection, cells were treated with bafilomycin (100 nM, 
0-8 h). d, Quantification of the LC3-II immunoblot signal from c, normalized 
to actin. P = 0.97 by two-way ANOVA and Bonferroni’s multiple comparison 
test. n = 5 experiments. Error bars represent s.e.m. e, Immunoblots of cell 
lysates from HEK-293 cells, transfected with GFP-parkin and control 
(luciferase) or USP30 shRNA constructs. f, Quantification of the LC3-II and 
p62 immunoblot signal from e, normalized to actin. g, Immunoblots of cell 
lysates from HEK-293 cells, transfected with GFP-parkin and control 
(luciferase) or USP30 shRNA constructs. 6 days after transfection, cells were 


ARTICLE 


treated with bafilomycin (100 nM, 0-8 h). h, Quantification of the LC3-II 
immunoblot signal from g, normalized to actin. **P < 0.01 by two-way 
ANOVA and Bonferroni’s multiple comparison test. n = 4 experiments. Error 
bars represent s.e.m. i, j, Immunoblots of cell lysates from HEK-293 cells, 
transfected with GFP-parkin and the indicated control (f-Gal) or USP30 
constructs. 24h after transfection, cells were treated with CCCP (20 11M, 0-6 h). 
B-Gal transfected cells were also treated with bafilomycin (100 nM, 0-6h) asa 
control (shown in j). k, Quantification of the p62 immunoblot signal from 

i, j, normalized to actin. *P < 0.05 and **P < 0.01 by two-way ANOVA and 
Bonferroni’s multiple comparison test. n = 6 experiments. Error bars represent 
s.e.m. 1, Immunoblots of cell lysates from HEK-293 cells, transfected with GFP- 
parkin and control (luciferase) or USP30 knockdown constructs. 24h after 
transfection, cells were treated with CCCP (20 uM, 0-8 h) and bafilomycin 
(100 nM), as indicated. m, Quantification of the p62 immuoblot signal from 
1, normalized to actin. *P < 0.05 by two-way ANOVA and Bonferroni’s 
multiple comparison test. n = 6 experiments. Error bars represent s.e.m. 
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Extended Data Figure 5 | USP30 deubiquitinates multiple mitochondrial 
proteins. a, Proteins whose ubiquitination is regulated by both USP30 and 
parkin. Asymmetric ‘volcano plot’ showing the subset of 41 proteins whose 
ubiquitination significantly increased (P < 0.05) by both GFP-parkin 
overexpression (right side) and USP30 knockdown (left side) in “combo” 
treatments compared to ‘CCCP-treatment’ alone. ‘Combo’ refers to cells 
treated with ‘CCCP + GFP-parkin’ or “CCCP + USP30-shRNA’. For this 
subset of proteins, fold-increase in ubiquitination (x-axis) and the P value 
(y-axis) are reported. Mitochondrial proteins (as identified by the Human 
MitoCarta database) are shown in red. Fold-changes and P values for all 
proteins with quantified K-GG peptides are reported in Supplementary Table 1. 
b, Immunoblots of anti- HA-immunoprecipitates for endogenous MIRO1 and 
TOM20 in parental HEK-293 cell line (that lacks GFP-parkin) transfected with 
HA-ubiquitin and the indicated Flag-tagged USP30 constructs (B-Gal as 
control). 24h after transfection, cells were treated with CCCP (5 uM, 2h) and 
ubiquitinated proteins were immunoprecipitated with anti-HA beads. 
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Immunoprecipitates and inputs were blotted with the indicated antibodies. 
n= 2 experiments. c, Immunoblots of total lysates of GFP-parkin HEK-293 
stable cells that were transfected with the indicated Flag-tagged USP30 
constructs, and then treated with CCCP (5 uM, 0-6h). d, Quantification of 
MIRO1 and TOM20 immunoblot signals from c, normalized to actin. 
Immunoblot signals for all other proteins (VDAC, Mfn-1, Tom70, HSP60, 
TIMM8a) did not reach significance. *P < 0.05, **P < 0.01, ***P < 0.001 
compared to B-Gal control, by two-way ANOVA and Bonferroni’s multiple 
comparison test. n = 3-5 experiments.) e, immunoblots of anti-HA- 
immunoprecipitates for endogenous MIRO1 and TOM20 with USP30 
knockdown. HEK-293 cells stably expressing GFP-parkin were transfected as 
indicated with HA-ubiquitin, human USP30 shRNA and rat USP30-Flag 
cDNA that is insensitive to the shRNA (luciferase shRNA and B-Gal as 
controls). 6 days after transfection, cells were processed as in b. n = 2 
experiments. 
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Extended Data Figure 6 | TOM20 activates mitophagy through 
ubiquitination; USP30 is a parkin substrate. a, Extracted ion chromatograms 
corresponding to K-GG peptides identified from 'TOM20 in the USP30 
knockdown mass spectrometry. Relative abundance of each ubiquitinated 
peptide is shown on the y-axis relative to the most abundant analysis, with 
precursor ion m/z indicated above each peak. The sequence of each K-GG 
peptide is shown below in green. Asterisks denote modified lysine residues. 
b, Immunoblots of HA-ubiquitin precipitates from GFP-parkin HEK-293 cells 
transfected with the indicated constructs. Following transfection and treatment 
with CCCP (5 UM, 2h), ubiquitinated proteins were immunoprecipitated with 
anti-HA beads, and precipitates and inputs were blotted with the indicated 
antibodies. n = 3 experiments. c, mt-Keima imaging in neurons transfected 
with the indicated TOM20-MYC and USP30 constructs (-Gal as control). 
Scale bar, 5 um. d, Quantification of mitophagy index from c. ***P < 0.001 by 
Kruskal-Wallis test and Dunn’s multiple comparison test. n = 67-80 cells for 
all groups. n = 3 experiments. Error bars represent s.e.m. e, Extracted ion 
chromatograms corresponding to K-GG peptides identified from USP30 in the 
parkin overexpression mass spectrometry. Similar to a. f, Immunoblots of anti- 
HA-immunoprecipitates for endogenous USP30 from cells transfected with 
wild-type, K161N and G430D GFP-parkin constructs. After 24 h of expression, 
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cells were treated with CCCP (20 uM, 2h) and ubiquitinated proteins 

were immunoprecipitated with anti-HA beads. Immunoprecipitates and 
inputs were blotted with the indicated antibodies. g, Quantification of 
immunoblot signal for co-immunoprecipitated USP30 from f. Protein levels 
co-immunoprecipitating with anti-HA beads are normalized to the ‘wild-type 
GEP-parkin + CCCP’ group. ***P < 0.001 by one-way ANOVA and 
Dunnett’s multiple comparison test, compared to ‘wild-type GFP-parkin + 
CCCP’. n = 5 experiments. Error bars represent s.e.m. h, Immunoblots of 
lysates prepared from HEK-293 cells transfected with the indicated GFP 

and GFP-parkin constructs and treated with CCCP (20 uM, 0-6h). 

i, Quantification of immunoblot signal for USP30 from h, normalized to actin. 
**P << 0.01, ***P < 0.001 compared to wild-type GFP-parkin, by two-way 
ANOVA and Bonferroni’s multiple comparison test. n = 4 experiments. Error 
bars represent s.e.m. j, Immunoblots of lysates prepared from HEK-293 cells 
transfected with GFP-parkin and treated as indicated (CCCP 20 uM, 6h; 
bafilomycin (100 nM), MG132 (20 11M), and epoxomicin (2 1M) were added 
15 min before CCCP treatment). k, Quantification of immunoblot signal for 
USP30 from j, normalized to actin. *P < 0.05 and ***P < 0.001 by one-way 
ANOVA and Dunnett’s multiple comparison test, compared to ‘DMSO + 
CCCP’. n = 4 experiments. Error bars represent s.e.m. 
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Extended Data Figure 7 | USP30 knockdown rescues mitophagy defects in 
cells expressing mutant parkin. a, Immunoblot for endogenous USP30 in SH- 
SY5Y cells transfected with USP30 siRNA for 3 days. b, c, Immunostaining in 
SH-SY5Y cells stably expressing GFP-parkin(G430D), transfected with 
siRNAs against USP30, USP6 or USP14. 3 days after transfection, cells were 
treated with CCCP (20 uM, 24h), then fixed and stained for GFP and 
endogenous TOM20. Scale bars, 5 jm. d, Quantification of fold change in 
TOM20 staining intensity from b and c, normalized to control siRNA. 

**P < 0.01 by Kruskal-Wallis test and Dunn’s multiple comparison test. n = 3 
experiments. Error bars represent s.e.m. e, Immunostaining of SH-SY5Y cells 
expressing GFP-parkin(G430D), transfected as indicated, and treated with 
CCCP (20 uM, 24h). Rat USP30 cDNA is insensitive to human USP30 siRNA. 
f, Quantification of fold change in TOM20 intensity from e. Kruskal-Wallis 
test, n = 3 experiments. g, Immunostaining of SH-SY5Y cells expressing GFP- 
parkin(K161N), and transfected with USP30 siRNA. Following 3 days of 
knockdown, cells were treated with CCCP (20 1M, 24h), then fixed and 
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stained for GFP and endogenous TOM20 and HSP60. Scale bars, 5 um. 

h, Quantification of fold change in TOM20 or HSP60 staining intensity from 
g, normalized to control siRNA. *P < 0.05 by Mann-Whitney test, n = 4 
experiments. Error bars represent s.e.m. i, k, Immunostaining of SH-SY5Y cells 
expressing GFP-parkin(K161N), and transfected with USP30 siRNA. 
Following 3 days of knockdown, cells were treated with CCCP (20 11M, 4h), 
then fixed and stained for GFP and endogenous p62 (i) or LC3 (k). Co- 
localization of GFP-parkin (show in green) and p62 or LC3 (shown in red) is 
shown in the lower panel. Scale bars, 5 tum. j, 1, Quantification of GFP- 
parkin(K161N)-associated p62 (j) or LC3 (1) staining intensity normalized by 
GFP-parkin(K161N) area, from i, k. **P < 0.01 by Mann-Whitney test, 

n = 9-10 experiments. Error bars represent s.e.m. m, mt-Keima imaging in 
neurons transfected with PINK1 shRNA and USP30(C77A)-Flag. Scale bar, 
5 |im. n, Quantification of mitophagy index from m. Kruskal-Wallis test. 

n = 127-166 cells. n = 7 experiments. Error bars represent s.e.m. 
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Extended Data Figure 8 | USP30 knockdown decreases oxidative stress in 
neurons and rescues mitochondrial morphology defects in PINK1 mutant 
flies. a, Ratiometric mito-roGFP imaging in hippocampal neurons transfected 
with USP30 shRNA. Following measurement of ratiometric mito-roGFP signal 
in individual cells, the dynamic range of the probe was calibrated by treating 
cultures sequentially with DTT (1 mM) to fully reduce the probe, and aldrithiol 
(100 11M) to fully oxidize the probe’’. The ‘relative oxidation index’ is shown 
in a ‘colour scale’ from 0 (mito-roGFP ratio after DTT treatment, shown 

in black) to 1 (mito-roGFP ratio after aldrithiol treatment, shown in red). 

b, Quantification of relative oxidation index from a. ***P < 0.001 by Mann- 
Whitney test. 1 = 24 cells for luciferase shRNA and 36 cells for USP30 shRNA. 
n= 3 experiments. Error bars represent s.e.m. c, Quantitative RT-PCR of 
dUSP30 mRNA. qRT-PCR in Actin-GAL4, UAS-dUSP30°™,, and 
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Actin-GAL4 > UAS-dUSP30"™. flies, shown relative to Actin-GAL4. dUSP30 
mRNA levels were normalized to internal control Drosophila RpII140 mRNA 
levels. ***P < 0.001 by one-way ANOVA and Dunnett’s multiple comparison 
test. n = 3 experiments. Error bars represent s.e.m. d, Transverse sections of 
Drosophila indirect flight muscles of indicated genotypes. Arrowheads, 
electron-dense mitochondria; dashed lines, ‘pale’ mitochondria with 
disorganized cristae. Scale bars, 1 um (top), 0.2 um (bottom panels). 

e, f, Quantification of mitochondrial morphology (e) and size distribution 

(f) from d. Mann-Whitney test (e). Kolmogorov-Smirnov test, pink1® ° versus 
‘pink1®? + dUSP30 knockdown’ (f). n = 4 flies per genotype. g, Transverse 
sections of indirect flight muscles (IFMs) from vehicle- or paraquat-treated flies 
of indicated genotypes. Scale bar, 0.5 um. 
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Extended Data Figure 9 | Neurodegeneration was not observed in genetic 
parkin fly models; dUSP30 knockdown protects against paraquat-induced 
climbing and dopamine deficits. a, c, e, Representative images of the indicated 
dopaminergic neuron clusters in flies with indicated genotypes. Scale bars, 

10 um. b, d, f, Blind quantification for panels a, c, e. P values calculated by 
Student’s t-test (f) and one-way ANOVA and Bonferroni’s multiple 
comparison test (b, d). n = 4-5 hemibrains per genotype. Similar results were 
obtained with additional counts performed for the PPL1 cluster, n = 18-40 
hemibrains per genotype. Error bars represent s.e.m. g, Dopamine levels in fly 
brains for the indicated genotypes. n = 16 flies per genotype. P values calculated 
by one-way ANOVA and Bonferroni’s multiple comparison test. h, Climbing 
assay in control flies (Actin-GAL4). Flies were treated with vehicle control (5% 
sucrose) or paraquat (10 mM, 48 h). L-3,4-dihydroxyphenylalanine 

(1 mM, 48h) was administered simultaneously with paraquat, as indicated. 
Graph shows % of flies climbing 15 cm in 30s. **P < 0.01 by Kruskal-Wallis 
test and Dunn’s multiple comparison test. n = 6 experiments. Error bars 


represent s.e.m. i, Serotonin levels per fly head, as assessed by ELISA. Flies 
were treated with paraquat (10 mM, 48 h) or vehicle control (5% sucrose). 

P values calculated by Kruskal-Wallis test and Dunn’s multiple comparison 
test. n = 8 heads, n = 2 experiments. Error bars represent s.e.m. j, Climbing 
assay of dUSP30 knockdown flies driven by Th-GAL4. Flies were treated with 
paraquat (10 mM, 48 h) or vehicle control (5% sucrose). Graph shows % of flies 
climbing 15 cm in 30s. *P < 0.05 by Kruskal-Wallis test and Dunn’s multiple 
comparison test. n = 4 experiments. Error bars represent s.e.m. k, Climbing 
assay of dUSP30 knockdown flies driven by Actin-GAL4. Flies were treated with 
paraquat (10 mM, 48 h) or vehicle control (5% sucrose). Graph shows % of flies 
climbing 15 cm in 30s. **P < 0.01 and ***P < 0,001 by one-way ANOVA and 
Bonferroni’s multiple comparison test. m = 6-10 experiments. Error bars 
represent s.e.m. 1, m, Normalized dopamine levels per fly head, as assessed by 
ELISA. Flies of the indicated genotype were treated with paraquat (10 mM, 
48 h) or vehicle control (5% sucrose). *P < 0.05, **P < 0.01, and ***P < 0.001 
by Mann-Whitney test. n = 8-28 heads. Error bars represent s.e.m. 
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Extended Data Figure 10 | Knockdown of DUBs dYOD1 or dUSP47 in flies 
does not provide protection against paraquat; hUSP30 overexpression 
reverses dUSP30 knockdown benefits. a, b, Quantitative RT-PCR 
measurement of dUSP47 (a) and dYOD1 (b) mRNA levels in flies of the 
indicated genotypes, expressed as relative to Actin-GAL4 genotype. **P < 0.01 
and ***P < 0.001 by one-way ANOVA and Dunnett’s multiple comparison 
test. n = 3 technical replicates. Error bars represent s.e.m. c, Survival curves of 
flies with dUSP47 or dYOD1 knockdown, treated with vehicle (5% sucrose) or 
paraquat (10 mM). Graph shows percent of flies alive at indicated times. 

*P < 0.05 and **p<0.01 by two-way ANOVA and Bonferroni’s multiple 
comparison test. n = 5-7 experiments. Error bars represent s.e.m. d, Survival 
curves of flies with dUSP30 knockdown driven by Th-GAL4, treated with 
paraquat (10 mM). Graph shows percent of flies alive at indicated times after 


feeding with paraquat. **P < 0.01 and ***P < 0.001 by two-way ANOVA and 
Bonferroni’s multiple comparison test. n = 3 experiments. Error bars represent 
s.e.m. e, f, Quantitative RT-PCR measurement of hUSP30 and dUSP30 mRNA 
levels in flies of the indicated genotypes. **P < 0.01 and ***P < 0.001 by 
one-way ANOVA and Dunnett’s multiple comparison test. n = 4 experiments. 
Error bars represent s.e.m. g, Climbing assay for flies overexpressing hUSP30. 
Flies of indicated genotypes were fed with vehicle (5% sucrose) or paraquat 
(10 mM, 48 h); graph shows percent of flies climbing 15 cm in 30s. *P < 0.05 by 
Kruskal-Wallis test and Dunn’s multiple comparison test. n = 4 experiments. 
Error bars represent s.e.m. h, Survival assay for flies overexpressing hUSP30. 
Flies were fed paraquat (10 mM); graph shows % live flies at indicated times. 
*P <0.05 and ***P < 0.001 by two-way ANOVA and Bonferroni’s multiple 
comparison test. n = 4-11 experiments. Error bars represent s.e.m. 
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Measurement of the magnetic interaction between 
two bound electrons of two separate ions 


Shlomi Kotler’, Nitzan Akerman, Nir Navon'}, Yinnon Glickman! & Roee Ozeri! 


Electrons have an intrinsic, indivisible, magnetic dipole aligned with 
their internal angular momentum (spin). The magnetic interaction 
between two electronic spins can therefore impose a change in their 
orientation. Similar dipolar magnetic interactions exist between other 
spin systems and have been studied experimentally. Examples include 
the interaction between an electron and its nucleus and the interac- 
tion between several multi-electron spin complexes’ *. The challenge 
in observing such interactions for two electrons is twofold. First, at 
the atomic scale, where the coupling is relatively large, it is often dom- 
inated by the much larger Coulomb exchange counterpart’. Second, 
on scales that are substantially larger than the atomic, the magnetic cou- 
pling is very weak and can be well below the ambient magnetic noise. 
Here we report the measurement of the magnetic interaction between 
the two ground-state spin-1/2 valence electrons of two **Sr* ions, co- 
trapped in an electric Paul trap. We varied the ion separation, d, between 
2.18 and 2.76 micrometres and measured the electrons’ weak, milli- 
hertz-scale, magnetic interaction as a function of distance, in the pre- 
sence of magnetic noise that was six orders of magnitude larger than 
the magnetic fields the electrons apply on each other. The coopera- 
tive spin dynamics was kept coherent for 15 seconds, during which 
spin entanglement was generated, as verified by a negative measured 
value of —0.16 for the swap entanglement witness. The sensitivity 
necessary for this measurement was provided by restricting the spin 
evolution to a decoherence-free subspace that is immune to collect- 
ive magnetic field noise. Our measurements showa d_*-° distance 
dependence for the coupling, consistent with the inverse-cube law. 

Early during the twentieth century, a number of experiments indicated 
that the electron is more than just an electrically charged point particle. 
Introducing the electron spin and its accompanying magnetic moment 
explained a multitude of experimental observations, such as the fine- 
structure spectrum of hydrogen, anomalous Zeeman splitting and the 
famous Stern—Gerlach experiment. Since then, the magnetic field of a 
single electron has been detected® and its magnetic dipole measured with 
unprecedented accuracy’. 

Because a single electron is a tiny magnet, every two electrons should 
influence each other’s magnetic dipole orientation, just like magnets do. 
However, because electrons are indistinguishable particles, this interac- 
tion competes with the Coulomb spin-exchange forces which are dominant 
on the atomic scale. This can be resolved by increasing the inter-electron 
separation, d. Although the magnetic energy becomes dominant, it also 
decreases as d_*. Therefore, such an approach can be fruitful only when 
accompanied by an appropriate increase in the magnetic dipole moment 
or an improvement in the measurement sensitivity. With recent advances 
in magnetometry on the scale of tens of nanometres, the magnetic inter- 
action of two nitrogen—vacancy spin-1 defects in diamond has been ob- 
served to result in their entanglement*, and weak interaction strengths, as 
lowas 60 Hz, have been measured’. A comparable magnetic interaction 
strength was observed between atoms in dipolar quantum gases”, where the 
magnetic dipole of each atom ranged from six to ten times that of the electron. 

In this work we used two trapped **Sr* ions, each with a single 
valence electron and no nuclear spin’. These bound electrons inherited 


the well-isolated environment of their ions along with a high degree of 
controllability (Methods Summary). Indeed, ions can be tightly con- 
fined and laser-cooled to their mechanical ground state’, allowing for 
the long interrogation times necessary for weak signal measurements. 
Examples include state-of-the-art detection of electric’”"” and magnetic?” 
fields, and the detection of gravitational time dilation’’. The relative mag- 
netic dipole correction imposed by using bound rather than free elec- 
trons is smaller than 0.0018% (ref. 19), which is well below our reported 
sensitivity. 

We now describe the magnetic dipolar interaction and competing noise. 
As shown in Fig. 1a, we aligned the external magnetic field along the line 
connecting the two ions. The spin part of the two-ion Hamiltonian can 
be written as 


h , 
H= 5 (Ma10z,1 +@a2022) +2hEoz 1022 —hE(Gx10x2 + Ty,.18y.2) (1) 


Here h is the Planck constant divided by 27; a; is the j © {x, y, z} Pauli 
spin operator of the ith spin; a_i = gtpB;/2h is the spin Larmor frequen- 
cy, where B;, jtp and gare the external magnetic field, Bohr magneton and 
the electron spin gyromagnetic ratio, respectively. The spin-spin interac- 
tion strength is € = Lo(gup/ 2)7/4nhd*, with po the vacuum permeability 
constant. The first term on the right-hand side of equation (1) describes 
the Zeeman shift of the spins’ energy due to the magnetic field. The second 
and third terms are due to spin-spin interactions. The second term creates 
a shift in the resonance frequency of one spin that is conditioned on the 
state of the other, and was recently measured for the case of two nitrogen— 
vacancy spin-1 defects**. The third term results in a collective spin flip in 
which a spin excitation is exchanged. Owing to conservation of energy, 
for this term to be on-resonance and effective, the two spins have to be 
exactly degenerate, that is, B has to be exactly uniform. It is the third term 
which was at the focus of our experiment. 

Ultimately, the ability to measure a weak magnetic spin-spin interac- 
tion is limited by collective external magnetic field fluctuations, described 
by the first term in equation (1). Typical laboratory magnetic field noise 
amplitudes are of the order of 0.1 ,1T, which is equivalent to fluctuations 
ofa few kilohertz in wa ; (i = 1, 2). These are, however, six orders of mag- 
nitude greater than the spin-spin interaction strength. 

A state-space solution can remedy the effect of these large magnetic 
fluctuations. It requires identifying a set of quantum states which are, on 
the one hand, sensitive to the desired signal, and, on the other hand, in- 
variant under a certain class of noise processes. Previously this approach 
was used to measure magnetic field gradients'*'*'* as well as narrow 
laser linewidths and the electric quadrupole shift of atomic transitions’°. 
Here we tailored the states to the magnetic dipolar interaction. The four 
eigenstates of the Hamiltonian in equation (1) are |} 1), || |) and the two 
entangled Bell states | + ) =(|T|) +|11)) /V2. The first two eigenstates 
are twice as susceptible to magnetic field fluctuations than are the single- 
spin states, whereas the energy splitting between the latter two is 4h¢€ 
and does not depend on B at all (see Fig. 1b for an energy level diagram). 
By restricting the spin-spin evolution to the decoherence-free subspace 
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Figure 1 | Experiment overview. a, Set-up schematics. Two 88sr* ions are 
trapped in a linear radio-frequency Paul trap (only d.c. electrodes at potential 
Veap are shown). b, Two spins’ energy diagram with magnetic interaction 
eigenvectors |W 4) =(|t1)+|11))/ ,/2. c, Geometric Bloch representation of 
the DES. | 44 =(IN) £AN)/ /2. Spin-spin interaction induces x rotation 
(blue arc). Magnetic field gradients generate z rotations (red arc). For 
d=2.A4,m, spin-spin coupling rotates |||) to the fully entangled state | 7+) 
after an interaction time Tg.y = 67 s. Our actual experiment duration was 
Texp = 15s, corresponding to a 21.6° rotation. This angle is estimated by the 
parity visibility (black double arrow projection of the Bloch vector on the 


(DFS) spanned by | ¥’), it is possible to observe spin-spin interactions 
without susceptibility to spatially homogeneous magnetic noise. 

Spin-spin interaction within the DFS takes a simple form which can 
be understood in terms of the geometric Bloch sphere representation 
shown in Fig. 1c. In this subspace, equation (1) takes the form Hint = 
2hE(\T LIT] + LL TXT). The | ¥..) states are invariant under the inter- 
action (Fig. 1b). All other states undergo rotation (Fig. 1c, solid blue 
arc) around the direction defined by |‘? +), hereafter referred to as the x 
direction (Fig. 1c). Starting from the north pole (|{|)), the system 
rotates through the fully entangled state |y, )=(|t|) +i] t))/V/2 and 
towards the south pole (|| 1)). 

Even in the DFS, spatial inhomogeneity in the external magnetic 
field can obscure the spin-spin signal. By observing the energy separa- 
tion between |{ |) and || 1) and compensating for inhomogeneities with 
external coils, we were able to reduce the gradients to as low as VB = 
3.57 X 10°? Tm | (Supplementary Information). This, however, was 
still strong enough to lift the degeneracy between |} |) and ||T) by 
A@, = (gHp/2h)(VB)d =2n x 20 mHz, thus detuning the weak, milli- 
hertz, spin-spin coupling from resonance and resulting in the Hamil- 
tonian H = Hint + (h/2)A@a(|T 1) (T1|—|11) (11). In geometric terms, 
starting at the Bloch sphere north pole, the system state is rapidly rotated 
by the field gradient about the z axis (Fig. 1c, red arc). This counteracts 
the slower revolution around x imposed by the spin-spin interaction, 
restricting its effect to a narrow region of solid angle ~7/400 sr near the 
north pole. 

Using a train of spin echos, we were able to further reduce these 
excessive magnetic field inhomogeneities by two orders of magnitude, 
toa negligible level. During their magnetic spin-spin evolution, the two 
dipoles were flipped at a rate of fo = 2 Hz. In geometric terms, this 
corresponds to a train of 180° rotations about the x axis (Fig. 1c). These 
collective rotations do not change the relative orientation of the spins, 
leaving the spin-spin interaction invariant (Fig. 1d, upper middle three 
spheres). The effect of the gradient, however, is averaged to zero because 
exchanging ||) and || [) is equivalent to constantly switching the sign 


P parity 


equatorial plane). A collective spin flip corresponds to 180° rotation about x. 
d, Experimental sequence. Infinitesimal spin evolution is depicted by the 
shaded sectors of the z-y and x-y projections of the Bloch sphere. After 
initialization to |||), spin-spin evolution is interrupted by equidistant 
collective spin flips, restricting the effect of magnetic field gradients (bottom 
middle three spheres). The Bloch vector continuously accumulates an angle 
with respect to z (upper middle three spheres). Finally, a controlled magnetic 
gradient rotates the Bloch vector about the z axis by parity radians. The 
projection of the final Bloch vector on the x axis corresponds to the parity 
observable. e, Parity analysis fringe example (numerical). 


of the magnetic field gradient (Fig. 1d, lower middle three spheres). This 
scheme is an adaptation of the quantum lock-in method”’. 

We used parity analysis” to obtain a physical observable that is first- 
order sensitive to the interaction strength and the experiment time. 
The parity observable measures the coherence between |{ |) and ||). 
Toestimate it, we applied the following experimental sequence (Fig. 1d). 
The system state was initialized to |{ |) and then evolved under spin- 
spin interaction to |/(T)) = cos(2€T)|{ |) + isin(2ET)| | T). We then applied 
acontrolled magnetic field gradient, adding a superposition phase $ parity 
to yield |W(T)) =cos(2ET)|t |) + ie’ sin(2éT)| | 1). The expectation 
value of parity was then estimated as (parity) = P;; + P|, — P,, where 
P;}, P,, and P; = P;, + P|; are the probabilities of finding the system in 
the respective states, measured projectively after performing a collective 
n/2 spin rotation (Supplementary Information). In this case, (parity) = 
sin(4éT)sin( parity). The parity visibility, sin(4¢T), is extracted either by 
scanning parity (Fig. le) or by setting it to 1/2. Geometrically, parity 
corresponds to the projection of the Bloch vector on the x axis (Fig. 1d, 
rightmost sphere) and its visibility corresponds to the projection of the 
Bloch vector on the equatorial plane (Fig. 1c, black double arrow). 

Measuring a weak, millihertz-scale, interaction requires an experi- 
ment duration of many seconds. Detection fidelity, however, deterio- 
rates at these long times owing to ion motion heating, eventually limiting 
the experiment duration. Asa result, the measured parity visibility reduces 
by a factor of « = 1 — 4D(1 — D), where D is the average of the |} {) and 
|| |) detection fidelities. Figure 2a shows the detection fidelities as a func- 
tion of time for |fT) and |||) at an inter-ion distance of d= 2.4 um. 
Although our measurement scheme was tailored to be first-order insens- 
itive to heating, its residual effect degraded the detection fidelity from 
more than 0.95 at T = 5s to as lowas 0.88 at T = 25 s. These measure- 
ments are consistent with our measured heating rates, of ~10 quanta 
per second. Asymmetry in the detection scheme accounts for the fidel- 
ity of the ||?) measurement being better than that of the | | |) measure- 
ment. Similar detection fidelities are displayed in Fig. 2b as functions of 
ion separation, for a fixed T = 15 s experiment time. As seen, detection 
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Figure 2 | Characterization of quantum decoherence. a, Detection fidelity 
versus experiment time for d = 2.4m. The probabilities of measuring |) 
and | | |) given that the system was initialized to ||) and, respectively, || |) are 
shown by the red and blue dots. Each point is the average of N = 119 projective 
measurements. Solid lines are linear best fits. During the experiment, 
collective spin flips are applied at a period of 0.5s as in the actual spin-spin 
experiment. Detection fidelities degrade owing to ion motion heating. 
b, Detection fidelity versus d for T = 15s (similar to a). The points at d = 2.18, 
2.41, 2.76 um are the averages of N = 147, 119, 201 projective measurements, 
respectively. c, Dephasing time estimate. The system is initialized to 
|.) =(|T1) +11) /v2 and then a train of spin flips is applied as in a and 
b. Parity analysis is performed after respective wait times of T= 1s and 15s, 
shown by the blue and red dots. Each point is the average of N = 226 projective 
measurements. The blue and red lines are best fits to a cosine fringe, yielding 
respective amplitudes of 0.81(5) and 0.59(4) and a conservative estimate for the 
dephasing time of 44 + 12s. When factoring out detection efficiency, we 
observe no statistically significant dephasing. All error bars indicate projection 
noise. 


fidelity increases at lower inter-ion distances, corresponding to higher 
trap frequencies, where the effect of heating is known to be less pro- 
nounced". A further, minor reduction in « by a factor of >0.98 is due 
to imperfect initialization. See Supplementary Information for a com- 
plete discussion of heating, detection and initialization as well as their 
effect on «. 

We limited our experiment duration to T = 15s, beyond which the 
decrease in « compromises the increase in signal. Moreover, at the high 
trap voltages used, longer experiments resulted in a substantial ion loss 
from the trap owing to ion-crystal instabilities, thereby severely limiting 
the long averaging required to obtain statistical significance. The optimal 
15-s duration chosen was still long enough for dephasing to potentially 
limit the observation of spin-spin interaction. Here dephasing within 
the DFS, for example due to residual noise in the magnetic field gra- 
dient, averages away the superposition relative phase between |{ |) and 
|| T). It results in a decreasing parity visibility as a function of time. 
To characterize this phase coherence, the system was initialized to 
|\¥.)=(|tl) +\11)) / V2 usinga Molmer-Sorensen entangling gate”’, 
This was followed by a wait of duration T while performing spin flips as 
in the actual spin-spin experiment, and ended with parity analysis. The 
state | Y,.) was chosen because it is invariant under spin-spin coupling, 
while being sensitive to dephasing. Figure 2c displays the results for T = 1 s 
and T = 15 s using blue and red circles, respectively. A best fit to a cosine 
yields parity amplitudes of 0.81(5) (T = 1s) and 0.59(4) (T= 15s). A 
conservative estimate for coherence time, not taking detection fidelity 
into account, yields 44 + 12 s. Taking into account the degradation in detec- 
tion, we cannot observe any statistically significant dephasing after 15 s. 
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We now turn to describing the main results of this letter. Figure 3 
presents the parity measurements for two electronic spins undergoing 
magnetic dipolar interaction, at an inter-ion distance of d = 2.4 1m. A 
parity oscillation of (parity) = +asin(4€T)sin(Pparity) is expected, and 
will be positive when the initial state is |{ |) and negative for the initial 
state ||). Figure 3a shows parity versus parity for T = 0.1 s, which is 
much shorter than the spin-spin coupling timescale. As expected, no 
significant parity oscillation amplitude is detected. The T = 15s long- 
experiment results are shown in Fig. 3b. Here the parity sinusoidal depen- 
dence becomes evident. The solid blue and red lines are calculated from 
theory without any adjustable parameters, showing good agreement with 
the measured data. Shaded areas represent measurement uncertainties in 
determining «. The theoretical interaction strength at the d = 2.4 tum dis- 
tance is € = 2m X 0.93 mHz, in agreement with a single-parameter best fit 
of the data to the above theory yielding ¢ = 2m X 0.9(1) mHz. With the 
parity analysis sinusoidal dependence on parity established, the parity 
visibility can be measured by fixing parity = 7/2, acquiring a single point 
rather than a complete sinusoidal fringe. In Fig. 3c we display visibility 
versus interaction time (blue circles), which is in agreement with the 
theoretical curve, visibility = asin(4¢T) (blue line). 
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Figure 3 | Coherent oscillations due to magnetic spin-spin interaction for 
d= 2.4m. a, Parity analysis of a 0.1-s spin-spin experiment. Blue and red 
dots show parity measurements for the initial states |||) and | | 1), respectively. 
Each point is the average of N = 945 projective measurements. Solid lines are 
the spin-spin theory with no adjustable parameters, taking into account the 
preparation and detection fidelities, characterized in Fig. 2a. b, Same as Fig. 3a, 
for T = 15s. Each point is the average of N = 236 projective measurements. 
Shaded areas are 1-s.d. intervals for the solid line theories, due to the 
uncertainties in characterizing the preparation and detection fidelities. 

A best fit to asin(4€T)sin(parity) (not shown) yields an estimated coupling 
constant of € = 2m X 0.9(1) mHz, in reasonable agreement with theory 

(€ = 2n X 0.93 mHz). Here « = 0.75 is the visibility degradation factor, 
extracted from the data shown in Fig. 2a, as explained in the text. c, Parity 
amplitude (visibility) versus spin-spin interaction time, T. The parity 
observable is measured at parity = 1/2. The points at T = 0, 6, 8, 10, 12, 15s are 
the averages of N = 2,001, 1,000, 1,000, 600, 501, 819 projective measurements, 
respectively. Solid line and shaded area are the same as in a and b. A best fit to 
asin(4€T) (not shown) yields € = 2m X 1.1(2) mHz. All error bars indicate 
projection noise. 


©2014 Macmillan Publishers Limited. All rights reserved 


Although only partial entanglement is generated by spin-spin inter- 
action after 15 s, it can still be observed by measuring a negative expec- 
tation value for a properly chosen entanglement witness”. Here we chose 
the swap operator, defined as swap |a, b) = |b, a) for any two single spin 
states |a) and |b). In terms of the two spins’ density matrix, (swap) = 
P;,; + P,, — visibility. Therefore, entanglement is proven by experimen- 
tally verifying that” P;; + P, | < visibility. We repeat the spin-spin experi- 
ment N = 2,388 times, for d = 2.4 jum, measuring visibility = 0.28(2) and 
P}; +P; = 0.11(1). These conservative estimates, not taking the deteri- 
oration in detection fidelity into account, rendered the entanglement 
witness negative with good statistical significance: (swap) = —0.16(2). 
Here we assume a projection-noise-limited error in measured probabil- 
ities, supported by an Allen deviation analysis. Taking detection fidelity 
into account, using the calibration shown in Fig. 2a, our maximum- 
likelihood estimate renders (swap) = —0.41(4). See Supplementary Infor- 
mation for details of the maximum-likelihood estimation and the Allen 
deviation analysis. 

Finally, the spin-spin interaction dependence on inter-electron dis- 
tance is revealed by repeating the above measurement at different ion 
separations, d. Figure 4a shows the measured parity visibility (blue cir- 
cles) versus d, in good agreement with theory (blue line): visibility = 
asin(4¢éT). Here « is extracted from the data shown in Fig. 2b, and de- 
creases from a = 0.84 at d = 2.18 um to « = 0.70 at d = 2.76 um owing 
to the larger motion heating rates at lower trap frequencies. The 64% 
decrease in the visibility is thus a combined effect of the 17% decrease 
in « and the additional decrease in the spin-spin coupling constant, ¢. 
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Figure 4 | Magnetic spin-spin interaction as a function of distance. a, Parity 
visibility versus ion separation, d, is shown by the blue dots for a fixed 
experiment time, T = 15s. The points at d = 2.18, 2.41, 2.76 um are the 
averages of N = 2,306, 4,204, 1,796 projective measurements, respectively. 
Error bars indicate projection noise. Solid line is spin-spin theory without 
adjustable parameters, taking preparation and detection fidelities into account, 
as characterized in Fig. 2b. Shaded blue area is a 1-s.d. interval for the solid line 
theory, due to the uncertainties in characterizing the preparation and detection 
fidelities. b, Spin-spin coupling strength € versus ion separation (log-log scale). 
Blue dots are extracted from a, using visibility = xsin(4¢T). The visibility 
degradation factor, «, is extracted from the data in Fig. 2b. Error bars are 1-s.d. 
estimates due to projection noise in both the measurements and in the 
extraction of «. The error bar for the d = 2.75 um point is slightly larger owing 
to the corresponding decrease in «. Solid black line is spin-spin theory without 
any adjustable parameters. A linear best fit to € = juo(gup/2)"/4nhd " (not 
shown) yields n = 3.0(4), consistent with the n = 3 theoretical exponent. 
Shaded blue area indicates the n = 3.0 + 0.4 curves. 
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We can therefore estimate ¢, using the measured parity visibility, and 
the independently measured «, as shown in Fig. 4b (blue circles). A best 
fit to = Ho(gitp/2)°/4nhd" yields n = 3.0(4), in agreement with the 
cubic dependence of magnetic spin-spin interactions. Our apparatus 
allowed for a relatively small variation in d. Shorter distances required 
operating the trap at voltages higher than 400 V, where trap instabilities 
limited our integration efforts. Larger inter-ion separations resulted in 
a diminishing signal-to-noise ratio. Therefore, improving our measure- 
ment uncertainty requires a redesign of the ion trap, targeted at high- 
voltage operation. 

Because the measured interaction is very weak, it requires ruling out 
competing spurious effects. Specifically, the ions’ motion in the trap due 
to heating translates to a magnetic field in the ions’ rest frame. Although 
this cannot lead to spin entanglement, it could contribute to the parity 
signal. However, because such a field would oscillate at trap frequencies 
which are all below 5 MHz, it would couple non-resonantly to the spins 
whose resonant frequency is ~ 13.16 MHz. In Supplementary Information 
we quantitatively rule out this effect, as well as other possible competing 
phenomena. These include inter-ion distance fluctuations, quantization 
axis misalignment, radio-frequency electrode leakage and trap-electrode- 
generated magnetic field gradients. 

We have used a combination of techniques, originally developed for 
the protection of quantum information, to measure the weak spin-spin 
magnetic interaction. Future improvements of this experiment may sug- 
gest a new platform for the exploration of anomalous spin forces” on the 
micrometre scale. The use of DFSs, which was central to our approach, is 
not restricted to the specifics of the reported experiment and they could 
be used in other metrological scenarios’®"®. Quantum information pro- 
cessing continues to drive metrology with recent proposals” that 
harness quantum error correction for sensitive measurements. 


METHODS SUMMARY 


Our apparatus enabled us to place the electronic spins at a controlled distance from 
one another, as well as to initialize, manipulate and detect their internal spin state 
with high fidelity. Details of the set-up are found in ref. 8 as well as in the Sup- 
plementary Information. Briefly, a Coulomb crystal of two ions was formed in an 
electrical Paul trap’ with Doppler cooling. We used external voltages to push the 
ions against their Coulomb repulsion (Fig. 1a), thus controlling the inter-ion 
separation, d. The minimal distance attained was limited by our ability to maintain 
stable ion crystals without incurring a trap voltage breakdown. The inter-ion dis- 
tance is the difference between the equilibrium positions of two charged particles 
trapped in a harmonic trap, d = (2k.e*/M(21firap)) 3, where k, is Coulomb’s con- 
stant, e is the electron charge and M is the mass of 88¢-*_ The oscillation fre- 
quency, frrap» Was measured spectroscopically. For 88sr*, the valence electron spin 
states are |?) = |5s1/2,J = 1/2, My = 1/2) and ||) = |5s1/2,J = 1/2, My 1/2). State 
initialization to |}{) was done by optical pumping. We were able to perform all 
possible collective spin rotations by pulsing a resonant radio-frequency magnetic 
field and tuning the pulse duration and the radio-frequency field phase. State detec- 
tion was performed by state-selective fluorescence, distinguishing ||) and |||) 
from one another and from both ||) and |{ |), which were indistinguishable”*. 
All these collective operations had typical fidelities of more than 98%. We used 
inhomogeneities in the ion trap potential to perform differential spin rotations”””°, 
and were able to generate, for example, | | |) with a typical fidelity of more than 98%. 
Finally, we were able to generate the entangled states | +) =(|t1) +|Jt))/V2 
using a Molmer-Sorensen entangling gate with a typical fidelity of 95% (ref. 21). 
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Water has a number of anomalous physical properties, and some of 
these become drastically enhanced on supercooling below the freez- 
ing point. Particular interest has focused on thermodynamic response 
functions that can be described using a normal component and 
an anomalous component that seems to diverge at about 228 kelvin 
(refs 1-3). This has prompted debate about conflicting theories* 
that aim to explain many of the anomalous thermodynamic prop- 
erties of water. One popular theory attributes the divergence to a 
phase transition between two forms of liquid water occurring in the 
‘no man’s land’ that lies below the homogeneous ice nucleation temper- 
ature (Tj) at approximately 232 kelvin’ and above about 160 kelvin", 
and where rapid ice crystallization has prevented any measurements 
of the bulk liquid phase. In fact, the reliable determination of the 
structure of liquid water typically requires temperatures above about 
250 kelvin?’*. Water crystallization has been inhibited by using 
nanoconfinement"’, nanodroplets’’ and association with biomolecules'® 
to give liquid samples at temperatures below Tj;, but such measure- 
ments rely on nanoscopic volumes of water where the interaction 
with the confining surfaces makes the relevance to bulk water unclear"®. 
Here we demonstrate that femtosecond X-ray laser pulses can be used 
to probe the structure of liquid water in micrometre-sized droplets 
that have been evaporatively cooled’?! below T,;. We find experi- 
mental evidence for the existence of metastable bulk liquid water down 
to temperatures of 227 *? kelvin in the previously largely unexplored 
no man’s land. We observe a continuous and accelerating increase 
in structural ordering on supercooling to approximately 229 kelvin, 
where the number of droplets containing ice crystals increases rapidly. 
Buta few droplets remain liquid for about a millisecond even at this 
temperature. The hope nowis that these observations and our detailed 
structural data will help identify those theories that best describe 
and explain the behaviour of water. 

Figure 1a sketches our experimental set-up: in vacuum, a liquid jet’” 
generates spatially unconfined droplets of supercooled liquid water, the 
structure of which is then studied using intense, 50-fs, hard-X-ray laser 
pulses from the Linac Coherent Light Source (LCLS). A Rayleigh jet” 
produces a continuous, single-file train of water droplets with a uniform 
diameter of 34 or 37 lm, anda gas dynamic virtual nozzle” gives trains 
of smaller droplets with a diameter of 9 or 12 um. The droplets cool 
rapidly through evaporation and reach an average temperature that 
depends primarily on droplet size and travel time through the vacuum 
(Supplementary Information, section B.3), which we varied systema- 
tically by adjusting the distance between the dispenser nozzle and the 
X-ray pulse interaction region. Scattering patterns (at least 1,800 per 
data point; Supplementary Information, section A.1.3) were recorded 
from individual droplets with a single X-ray pulse over the temperature 


ranges of 227-252 K and 233-258 K for the droplets generated by gas 
dynamic virtual nozzle and Rayleigh jet, respectively. Temperature cali- 
brations were performed using the Knudsen theory of evaporative cool- 
ing with a typical absolute uncertainty of +2 K at the lowest temperatures 
(Supplementary Information, section B.3.5; Supplementary Tables 5 and 6; 
and Supplementary Fig. 20); the Knudsen theory was verified through 
extensive molecular dynamics simulations of droplet cooling (Supplemen- 
tary Information, section B.2). 

Figure 1b shows a typical diffraction pattern ofa liquid water droplet 
with a maximum momentum transfer of q ~ 3.5 A ' at the corners of 
the detector. The isotropic water diffraction rings uniquely identify the 
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Figure 1 | Coherent X-ray scattering from individual micrometre-sized 
droplets with a single-shot selection scheme. a, A train of droplets 
(Supplementary Information, section A.1.1) flows in vacuum perpendicular to 
~50-fs-long X-ray pulses. A coherent scattering pattern from a water droplet 
was recorded when a single droplet was positioned in the interaction region 
at the time of arrival of a single X-ray pulse. CSPAD stands for, Cornell-SLAC 
pixel array detector. b, c, Each diffraction pattern is classified (Supplementary 
Information, section A.1.3) either as a water shot exclusively containing 

pure liquid scattering characterized by a diffuse water ring (b), or as an ice 
shot characterized by intense and discrete Bragg peaks superposed on the 
water scattering ring (c). 
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liquid. As droplet temperature decreases, broad and bright Bragg peaks 
from hexagonal ice (Supplementary Information, section A.2.1) are some- 
times superposed on the water scattering ring (Fig. 1c). We denote as 
‘water shots’ diffraction patterns where we detected only water scat- 
tering, and as ‘ice shots’ diffraction patterns containing Bragg peaks. 
Ice could be detected if the illuminated volume of a droplet contained 
>0.05% ofice by mass (Supplementary Information, section A.2.2). For 
the 12-um-diameter droplets, the number of ice shots increased dras- 
tically as the travel time exceeded ~4 ms (Fig. 2). The probing of indi- 
vidual water droplets with single, ultrashort X-ray pulses allowed us to 
separate water shots from ice shots at each temperature, to give the frac- 
tion of droplets containing ice. Even among the droplets that interacted 
with X-ray pulses approximately 5 ms after exiting the nozzle, with the 
coldest liquid temperature of 2274? K, we found more than 100 water 
shots that contained water diffraction rings and no ice diffraction peaks 
out of 3,600 total hits; this is below the onset temperature of ice nuc- 
leation (Fig. 2) and inside water’s no man’s land below Ty ~ 232 K at 
ambient pressure’’. 

Structural properties of supercooled water in no-man’s land can be 
extracted from the X-ray scattering data. The total scattering factor, S(q), 
for each droplet temperature was obtained by averaging the scattering 
patterns from the respective water shots and removing the independent 
atomic scattering (Supplementary Information, section A.3.1). The result- 
ing sequence of temperature-dependent S(q) profiles for liquid water is 
shown in Fig. 3a, illustrating how the principal maximum of S(q) is split 
into a peak S, located at q, ~ 2 A’landa peak S, located at qy = 3 AW 
The amplitudes of, and separation between, S, and S) increase with 
decreasing droplet temperature, indicative of an increase in structural 
ordering as water is supercooled ever more deeply towards and into 
no-man’s land. Figure 3b illustrates the shift in the positions of the two 
peaks with temperature, as obtained from water droplets using the LCLS 
X-ray laser, and also based on synchrotron radiation data from metastable 
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Figure 2 | Time dependence of water crystallization during evaporative 
cooling. Ice shot fraction (green) and estimated temperature (blue) as 
functions of travel time in vacuum for droplets of diameter 12 j1m and speed 
10.35ms /. From the ice shot fraction, shown as mean = s.d. of two to seven 
individual recordings, we find the onset of ice nucleation to lie between 23275 , 
and 229+? K. The dashed blue lines represent maximum and minimum 
temperatures from the Knudsen model, which consistently overlap with 
experimental data sets from SSRL measured at known absolute temperatures 
(Supplementary Information, sections A.3.2 and B.3.5). 
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liquid water at only slightly supercooled temperatures and from stable 
liquid water at ambient temperatures; for the latter two data sets, the 
temperature was measured directly. The continuous changes in the S; 
and S, peak positions, without apparent break when moving between 
the independent X-ray laser and synchrotron radiation data sets, strongly 
supports the temperature calibration. 

At the lowest temperature accessible in the current study, the S, and 
S, peak positions approach the corresponding peak positions for tetra- 
hedrally coordinated structures (Fig. 3b), represented by low-density 
amorphous (LDA) ice as obtained by neutron scattering” and modelled 
for clusters of hexagonal ice using kinematic scattering (Supplementary 
Information, section A.2.3). The rapidly increasing fraction of ice shots 
on the millisecond timescale below ~229 K shows that glassy water, 
which would be associated with slow dynamics, is not formed at these 
temperatures (Fig 2). We note that the continuous change in the posi- 
tions of the diffraction peaks S,; and S, with temperature resembles the 
trend of similar diffraction peaks when high-density amorphous ice 
transforms into LDA ice”’. 

The separation between the S; and S, peaks is very sensitive to the 
amount and character of tetrahedrally coordinated configurations favoured 
by water’s directional hydrogen bonds. Water exhibits a peak in the oxygen- 
oxygen pair-correlation function, go0(r), at 4.5 A, corresponding to the 
second-nearest- -neighbour distance in tetrahedral coordination, ,/8 / 3a, 
where a ~ 2.8 A is the nearest-neighbour oxygen-oxygen distance in 
liquid water**. The inset of Fig. 4a shows that increasing temperature 
reduces the amplitude of the second goo(r) peak (henceforth denoted 
&) here obtained from molecular dynamics simulations using the TIP4P/ 
2005 water model (Methods); the same trend has also been observed 
experimentally for an increase in pressure” or temperature*®. We find 
that g. can be exploited as a good structural parameter to describe the 
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Figure 3 | Temperature dependence of water scattering peaks. a, Scattering 
structure factor, S(q), obtained from single-shot diffraction patterns 
(Supplementary Information, section A.3.1). Water temperature decreases 
from bottom to top (SSRL: 323, 298, 273, 268, 263, 258, 253, 251 K; LCLS: 251, 
247, 243, 239, 232, 229, 227 K). The data reveal a split of the principal S(q) 
maximum into two well-separated peaks, S; and S2 (dashed lines). 

b, Temperature dependence of the S, and S, peak positions, calculated from 
the maxima of local fifth-order polynomial least-squares fits with error 

bars estimated by shifting the derivatives of the polynomial fits by £0.05 A 
(LCLS) and +0.15A (SSRL) (Supplementary Information, section A.3.1). 
Green triangles are LCLS data from 12-\1m-diameter droplets; red circles are 
LCLS data from 34- and 37-11m-diameter droplets; and black squares are 
SSRL data from a static liquid sample. Purple diamonds are LCLS data from 
9-uum-diameter droplets measured at a separate LCLS run with separate 
q-calibration (Supplementary Information, section A.1.2). As the temperature 
decreases in no man’s land, the positions of peaks S, and S, approach the 
characteristic values of LDA ice (dash—dot blue lines) as determined from 
neutron diffraction” and clusters of hexagonal ice (ice I},; dashed red lines; 
Supplementary Information, section A.2.3). 
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Figure 4 | Temperature dependence of the tetrahedrality of liquid water. and the last data point for the 9-um-diameter droplets are ignored owing to 


a, Magnitude of the second g(r) peak, g>, as a function of the splitting, Aq, 
between the S, and S, peaks from TIP4P/2005 molecular dynamics simulations 
(dots). The inset illustrates g> for g(r) at 340 K (red solid line) and 210 K (black 
dashed line). b, Experimental g, values, derived from measured Aq values 
(labels as in Fig. 3b) and the fit to molecular dynamics data shown in a, with 
error bars estimated from the maximum and minimum Aq values allowed by 
the uncertainty in the S, and S, peak positions. Also shown is the fourth-order 
polynomial least-squares fit to the experimental data (black solid line), 

where the last (that is, low-T) two data points for the 12-j1m-diameter droplets 


tetrahedrality of liquid water, as demonstrated by the similar temper- 
ature dependences of g, and the tetrahedrality, Q, commonly used in 
molecular dynamics simulations” (Supplementary Fig. 21). The molecu- 
lar dynamics simulations furthermore show a clear correlation between 
g, and the splitting, Ag = q2 - q:, between the S, and S, peaks, where g> 
increases monotonically with Aq (Fig. 4a). We make use of this rela- 
tionship to extract from the measured Aq data ‘experimental’ g> values, 
which are plotted against temperature in Fig. 4b and compared with g, 
values from molecular dynamics simulations using the TIP4P/2005 and 
SPC/E water models. 

The data in Fig. 4b show that as water is cooled from above its freez- 
ing temperature, g, and its rate of change increase continuously as would 
be expected from the accelerated growth of tetrahedral structures in 
deeply supercooled water. We also note that the best-fit extrapolation 
of g, to low temperatures approaches the limit for LDA ice” (dash-dot 
line in Fig. 4b), which is expected to be representative also of crystal- 
line hexagonal ice (Fig. 3b), although g, cannot be uniquely defined for 
a crystalline sample. Furthermore, goo(r) for LDA ice and simulated 
supercooled TIP4P/2005 water at 220 K—with the same g, value as the 
maximum in the experimental data—are very similar, but differ from 
£o0(r) for water at ambient conditions (Fig. 4c). The structure of liquid 
water in no man’s land is therefore distinct from that of water under 
ambient conditions, exhibiting stronger tetrahedral ordering. The larg- 
est g values obtained from the experimental data (Fig. 4b) coincide with 
the observed onset of ice nucleation at ~229 K (Fig. 2), supporting 
simulations'®”* which concluded that the increased abundance and per- 
sistence of tetrahedrally coordinated water molecules play a central part 
as precursor structure for homogeneous ice nucleation. 

We still observe a non-negligible number of water shots at the far- 
thest measurement point, reached about 1 ms after the temperature of 
the corresponding water droplet has dropped to 229*? K, where homo- 
geneous nucleation sets in (Fig. 2). This implies that metastable water 
can transiently exist on the millisecond timescale down to 227*{ K. We 
expect that these observations and our data mapping the structural evo- 
lution of supercooled liquid water with decreasing temperature (Extended 
Data Tables 1-4) will enable a quantitative evaluation of different the- 
oretical models that predict this structure in no man’s land and aim to 
explain water’s anomalous physical properties. 


high nonlinearity in the detector response, which artificially decreases g> 
(Supplementary Information, section A.3.1). For comparison, the temperature 
dependences of g» for the TIP4P/2005 (red dashed line) and SPC/E (purple 
dashed line) models are depicted along with the characteristic value of g, for 
LDA ice” (blue dash-dot line). c, The g(r) of TIP4P/2005 water at 220 K (black 
solid line) bears a striking similarity to LDA ice” (red dashed line), whereas the 
measured g(r) of room-temperature water** (blue dash—dot line) shows 
significantly less structural correlation. 


METHODS SUMMARY 


Deionized water (PURELAB Ultra Genetic; resistivity 18.2 MQ cm at 298 K) was 
measured at the Stanford Synchrotron Radiation Lightsource (SSRL) small-angle 
X-ray scattering instrument (beam line 4-2) in a static, ~5-1l sample cell with an 
absolute temperature uncertainty of +1 K. The experimental details have been des- 
cribed elsewhere’. X-ray scattering experiments on water droplets of diameters 9, 
12, 34 and 37 jm were performed using the LCLS coherent X-ray imaging instru- 
ment” on deep supercooling (<260 K) calibrated according to the Knudsen theory 
of evaporation (Supplementary Information, section B3). These measurements over- 
lap with those made at SSRL (>250 K). Molecular dynamics simulations of evap- 
orative cooling of 1-4-nm-radii droplets were performed using the TIP4P/2005 
force field to verify the Knudsen theory of evaporation (Supplementary Informa- 
tion, section B2). The single-pulse water scattering patterns at LCLS were recorded 
with the Cornell-SLAC pixel array detector and corrected for the dark signal, gain 
variations, the polarization dependence of the X-ray scattering, solid-angle differ- 
ences and fluctuations in the average photon wavelength between X-ray pulses (Sup- 
plementary Information, section A.1.2). More details on the procedure for selecting 
water shots and sample statistics are given in Supplementary Information, sec- 
tion A.1.3. The total scattering structure factor was obtained from the averaged 
angularly integrated intensities by removing the atomic form factor contribution 
(Supplementary Information, section A.3.1). Measurements from both SSRL and 
LCLS had a scattering momentum transfer range of approximately 0.5 Alsq 
<35A 1, Large-scale (45,000 molecules) molecular dynamics simulations*® of 
bulk TIP4P/2005 water and medium-scale (512 molecules) molecular dynamics 
simulations of SPC/E water were performed for a wide range of temperatures to 
calculate the total scattering structure factor (Supplementary Information, section 
C.1.1), including intermolecular oxygen—hydrogen and oxygen-oxygen partial struc- 
ture factors and the intramolecular oxygen-hydrogen partial structure factor (the 
hydrogen-hydrogen contribution is negligible), which were used for the calibra- 
tion of g> against Aq (Supplementary Information, section C.1.3). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Extended Data Table 1 | Temperature-dependent S; and Sz peak 
positions for the 5-pl static sample 


Temperature (K) q: (A’) qe (A") 
323 +1 2.177 + 0.016 2.890 + 0.014 
308 + 1 2.135 + 0.013 2.928 + 0.011 
298 +1 2.108 + 0.011 2.947 + 0.009 
288 + 1 2.082 + 0.009 2.966 + 0.007 
278 +1 2.055 + 0.007 2.983 + 0.006 
273 +1 2.042 + 0.007 2.991 + 0.006 
268 + 1 2.026 + 0.007 2.999 + 0.006 
263 + 1 2.011 + 0.006 3.006 + 0.005 
258 + 1 1.994 + 0.006 3.013 + 0.005 
253 +1 1.975 + 0.006 3.019 + 0.004 
25141 1.967 + 0.005 3.021 + 0.004 


The temperature-dependent S; and Sz peak positions, respectively q; and qa, derived from the maxima 
of local (fifth-order) polynomial least-squares fits for the 5-1! static sample measured at beamline 4-2 at 
SSRL in January 2012. The error bars for the temperature correspond to the measurement accuracy 
(IEC 584-2 standard for K-type thermocouples); the error bars for q; and qz were estimated by shifting 
the derivatives of the polynomial fits by +0.15A when determining the positions of the maxima 


(Supplementary Information, section A.3.1). 
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Extended Data Table 2 | Temperature-dependent S, and Sz peak 
positions for the 34-37-y1m-diameter droplets 


Temperature (K) ai (A’) qe (A") 

+5 

258 3 1.992 + 0.011 3.019 + 0.021 
43 

251 5 1.950 + 0.008 3.016 + 0.014 
+2 

247 | 1.939 + 0.008 3.015 + 0.011 

24342 1.925 + 0.008 3.039 + 0.013 

242 +1 1.915 + 0.006 3.025 + 0.011 

239 +1 1.899 + 0.006 3.025 + 0.011 
1 

235 _ 1.887 + 0.008 3.052 + 0.013 

233 +1 1.872 + 0.007 3.051 + 0.011 


The temperature-dependent S, and Sp peak positions, respectively q; and qo, derived from the maxima 
of local (fifth-order) polynomial least-squares fits for the 34-37-,1m-diameter droplets measured using 
the coherent X-ray imaging (CXI) instrument at LCLS in February 2011. The error bars for the 
temperature were estimated by forcing overlap between experimental data sets using the Knudsen 
theory of evaporation (Supplementary Information, section B.3.5); the error bars for q; and q2 were 
estimated by shifting the derivatives of the polynomial fits by +0.05 Awhen determining the positions 
of the maxima (Supplementary Information, section A.3.1). 
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Extended Data Table 3 | Temperature-dependent S, and Sj peak 
positions for the 12-1m-diameter droplets 


Temperature (K) qi (A’) qe (A") 

a 1.971 + 0.009 3.062 + 0.008 
+2 

237 (99 1.901 + 0.006 3.046 + 0.010 
+2 

232 (o1 1.868 + 0.006 3.056 + 0.010 
+2 

229 || 1.850 + 0.005 3.053 + 0.010 
+2 

228 | 1.869 + 0.008 3.068 + 0.013 
+2 

227 || 1.867 + 0.006 3.044 + 0.010 


The temperature-dependent S; and Sz peak positions, respectively qi and qz, derived from the maxima 
of local (fifth-order) polynomial least-squares fits for the 12-y.m-diameter droplets measured using the 
CXl instrument at LCLS in February 2011. The error bars for the temperature were estimated by forcing 
overlap between experimental data sets using the Knudsen theory of evaporation (Supplementary 
Information, section B.3.5); the error bars for q; and qz were estimated by shifting the derivatives of the 
polynomial fits by +0.05 Awhen determining the positions of the maxima (Supplementary Information, 
section A.3.1). 
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Extended Data Table 4 | Temperature-dependent S, and Sz peak 
positions for the 9-1m-diameter droplets 


Temperature (K) a (A’) qe (A) 
233 7 1.918 + 0.010 3.066 + 0.022 
+2 
230 |) 1.900 + 0.009 3.076 + 0.019 
+2 
228 | 1.920 + 0.013 3.087 + 0.024 


The temperature-dependent S, and Sp peak positions, respectively q; and qo, derived from the maxima 
of local (fifth-order) polynomial least-squares fits for the 9-1m-diameter droplets measured using the 
CXI instrument at LCLS in January 2013. The error bars for the temperature were estimated by forcing 
overlap between experimental data sets using the Knudsen theory of evaporation (Supplementary 
Information, section B.3.5); the error bars for q; and qz2 were estimated by shifting the derivatives of the 
polynomial fits by +0.05 Awhen determining the positions of the maxima (Supplementary Information, 
section A.3.1). 


©2014 Macmillan Publishers Limited. All rights reserved 


Mae Ae Teas 


doi:10.1038/nature13405 


Metastable liquid-liquid transition in a molecular 


model of water 


Jeremy C. Palmer’, Fausto Martelli”, Yang Liu't, Roberto Car*, Athanassios Z. Panagiotopoulos! & Pablo G. Debenedetti' 


Liquid water’s isothermal compressibility’ and isobaric heat capacity’, 
and the magnitude of its thermal expansion coefficient’, increase sharply 
on cooling below the equilibrium freezing point. Many experimental**, 
theoretical?" and computational’*”’ studies have sought to under- 
stand the molecular origin and implications of this anomalous beha- 
viour. Of the different theoretical scenarios”’*”* put forward, one posits 
the existence of a first-order phase transition that involves two forms of 
liquid water and terminates at a critical point located at deeply super- 
cooled conditions”’”. Some experimental evidence is consistent with 
this hypothesis*”*, but no definitive proof of a liquid-liquid transition 
in water has been obtained to date: rapid ice crystallization has so far 
prevented decisive measurements on deeply supercooled water, although 
this challenge has been overcome recently'®. Computer simulations 
are therefore crucial for exploring water’s structure and behaviour in 
this regime, and have shown’*’””! that some water models exhibit 
liquid-liquid transitions and others do not. However, recent work’ 
has argued that the liquid-liquid transition has been mistakenly inter- 
preted, and is in fact a liquid-crystal transition in all atomistic models 
of water. Here we show, by studying the liquid-liquid transition in the 
ST2 model of water” with the use of six advanced sampling methods 
to compute the free-energy surface, that two metastable liquid phases 
and a stable crystal phase exist at the same deeply supercooled ther- 
modynamic condition, and that the transition between the two liquids 
satisfies the thermodynamic criteria of a first-order transition”. We 
follow the rearrangement of water’s coordination shell and topo- 
logical ring structure along a thermodynamically reversible path from 
the low-density liquid to cubic ice’®. We also show that the system 
fluctuates freely between the two liquid phases rather than crystalliz- 
ing. These findings provide unambiguous evidence for a liquid-liquid 
transition in the ST2 model of water, and point to the separation of 
time scales between crystallization and relaxation as being crucial for 
enabling it. 

Although several recent investigations using free-energy methods 
designed specifically to study phase transitions” have shown that the 
ST2 model of water undergoes a liquid-liquid transition’””’, other 
investigations” involving seemingly identical simulations using the 
same model found only a single liquid and a crystalline phase and con- 
cluded that what in reality is a crystallization transition had been mistaken 
for a liquid-liquid transition. Because there are stringent thermodynamic 
conditions that a first-order transition must satisfy, it is possible, albeit 
computationally expensive, to definitively verify or falsify the existence 
of a liquid-liquid transition. To this end we use six state-of-the-art free- 
energy methods (four of which are documented in Methods) and scaling 
analysis, and we construct a thermodynamically reversible path between 
the liquid and crystalline phases of the ST2 model. 

Figure 1a, b shows perspective and orthographic projections of the free- 
energy surface for ST2 water at 228.6 K and 2.4 kbar, as a function of 
density and an order parameter”’, Q,, that can distinguish crystalline states 
from configurations lacking long-range order. It can be seen that two dis- 
ordered (low-Qg) phases of different density are in equilibrium (same 
free energy) with each other, both of them being metastable with respect 


to the crystal phase, the latter having much lower free energy. To our 
knowledge, this is the first time that two metastable liquid phases in 
equilibrium with each other and a third, stable crystalline phase have 
been identified in a pure substance at the same temperature and pressure 
in a computer simulation. 

The system-size dependence of the free-energy barrier separating coex- 
isting phases is a stringent test of the presence of a true first-order trans- 
ition in computer simulations**”*”*. We have calculated the free-energy 
surface in the low-Q, region corresponding to the two liquids for system 
sizes N = 192, 300, 400 and 600, with Fig. 2a showing that the corres- 
ponding barrier heights satisfy the N’” scaling characteristic of first- 
order transitions. This scaling is a consequence of the surface free energy 
increasing as the interface between the liquids grows with system size””*”>. 
For the range of system sizes examined, the interface manifests itself 
through the formation of water clusters with local environments char- 
acteristic of each distinct liquid phase. Figure 2b shows an example of this 
behaviour in a water configuration taken from a simulation performed 
near the barrier region for N = 600. Because non-zero average Qe values 
in an amorphous phase arise solely from fluctuations in finite systems, this 
quantity must also exhibit a system-size dependence”. Figure 2c shows 
that Q, vanishes as N- 1, in agreement with the theoretical expectation”, 
thereby confirming the amorphous character of the low-density liquid 
(LDL) phase. 

Figure 3 shows the free-energy surface computed from standard Monte 
Carlo (MC) simulations at fixed temperature and pressure, during which 
each system was sampled for 100 relaxation times without the imposition 
of any constraint. Over the course of these long simulations, the systems 
fluctuate between the two coexisting phases enough times so as to allow 
the calculation of the free-energy surface, which is in excellent agreement 
with the results shown in Fig. 1b for the low-Q, region and also with those 
obtained from the four other sampling techniques (see Methods and 
Extended Data Fig. 1). During this time, Q¢ remains invariably in the 
amorphous region and the systems show no evidence of crystallization. 
The LDL phase exhibits slow dynamics, and proper scrutiny of a meta- 
stable state requires sampling to occur over times that comfortably exceed 
the system’s structural relaxation time, while being significantly shorter 
than the nucleation time of the stable ice phase. The latter’s density is very 
similar to that of LDL (Fig. 1). Ice nucleation, should it occur, takes place 
within LDL”, rather than from the high-density liquid (HDL). The inset 
to Fig. 3 shows the relaxation of fluctuations in density and structural 
order (Q¢) in LDL. It can be seen that, after an intermediate transient 
period during which these processes are separated by as much as two 
orders of magnitude, fluctuations in density and Q, decay on very similar 
time scales. As documented in Methods, this is a general result, but the 
transient behaviour is sensitive to the particular algorithm used to sam- 
ple configurations (Extended Data Fig. 2). The results of Fig. 3 confirm 
that the LDL phase is a properly equilibrated liquid, and that under the 
conditions investigated here, the characteristic time for nucleation of 
the stable ice phase is much longer than the structural relaxation time for 
the LDL phase in the ST2 model of water. The key role of kinetics in stabi- 
lizing the liquid-liquid transition is further emphasized by the fact that 
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Figure 1 | Thermodynamic equilibrium between metastable liquid 
polymorphs. a, Reversible free-energy surface (F = free energy, B = 1/kgT) 
described by density and the crystalline order parameter, Qc, for 192 ST2 
water molecules at a point of liquid-liquid coexistence (228.6 K and 2.4 kbar). 
b, An orthographic projection of the free-energy surface shown in a. The 
HDL and LDL basins (p ~ 1.15 gcm~* and p ~ 0.90 gcm *, respectively) 
located at Q, ~ 0.05 are separated by a ~4k,T free-energy barrier and are 
metastable with respect to cubic ice (Q, ~ 0.52, p ~ 0.90 g cm” °) by ~75kgT at 
this temperature and pressure. The average uncertainty in the free-energy 
surface is less than 1kgT. Contours are 1kgT apart. 


the system fluctuates freely between the two liquid basins in uncon- 
strained simulations (Fig. 3) without crystallization, even though the bar- 
rier separating the two liquids is comparable to that separating LDL and 
ice (Fig. 1). 

Figures 1 and 3 show that the metastable liquids are not distinguished 
by Q, because of their amorphous nature, suggesting that other order para- 
meters must be used to characterize their structure. The local structure 
index’ (LSI) is an order parameter that quantifies the extent to which a 
molecule possesses a tetrahedral environment with well-separated first 
and second coordination shells. Figure 4 shows the free-energy surface of 
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Figure 2 | Finite-size scaling and the liquid-liquid interface. a, Free-energy 
barrier separating the HDL and LDL basins computed at coexistence for 
systems containing N = 192, 300, 400 and 600 ST2 water molecules. The barrier 
height increases with system size, obeying the N*” scaling law expected for a 
first-order phase transition. Error bars were computed using the bootstrap 
analysis described in Methods. b, Large clusters are formed near the barrier 
region by water molecules with local coordination environments characteristic 
of HDL and LDL (blue and red molecules, respectively). The local structure 
index order parameter, I, described in the text and Methods was used to 
characterize each molecule’s local environment, with blue molecules (HDL) 
having I= 0.12 A? and red molecules (LDL) having I > 0.12 A. The green 
simulation box containing 600 ST2 water molecules has been replicated across 
its periodic boundaries to illustrate that the clusters span the length of the 
unit cell. c, The mean value of Q, averaged over the LDL basin decreases with 
system size, scaling as N”, and confirming the disordered nature of the 
liquid. The symbol size is larger than the estimated uncertainty for (Qc). 


ST2 water at 228.6 K and 2.4 kbar for N = 192 projected onto the space 
parameterized by the first moment of the molecular LSI distribution, 
T, and Qs. Water molecules within the HDL phase have a disordered 
coordination structure, resulting in I ~ 0, because of the presence of inter- 
stitial molecules residing between the first and second neighbour shell 
that disrupt local tetrahedral order. The coordination structure of LDL 
(I~ 0.15 A?) is more ordered, with two distinct neighbour shells that 
give rise to its ice-like density. The LSI parameter also distinguishes the 
ice phase (I ~ 0.25 A?) with its well-defined coordination structure that 
exhibits long-range, crystalline order. The inset to Fig. 4 shows that the 
changes in the coordination structure along the HDL-LDL and LDL- 
crystal paths are accompanied by large topological rearrangements des- 
cribed by the first moment of the ring size distribution, R. The average 
ring size decreases monotonically along the HDL-LDL path, suggesting 
a continuous rearrangement process. In contrast, abrupt, non-monotonic 
behaviour is observed along the transition from LDL to the crystal in the 
vicinity of the saddle point in the I-Q, free-energy surface, which is con- 
sistent with structural rearrangements that have been observed in ice 
nucleation trajectories taken from long molecular-dynamics simulations 
of the TIP4P water model”’. We note, however, that the system size 
examined here, although suitable for accurate free-energy calculations, 
may be insufficient to provide information about the mechanisms gov- 
erning the ice nucleation and growth process. Such behaviour should 
therefore be investigated in future studies using larger systems. 

Our free-energy calculations demonstrate that the ST2 model of water 
exhibits a liquid-liquid phase transition under deeply supercooled condi- 
tions. An emerging question is to understand which aspects of intermol- 
ecular interactions cause some water models to undergo a liquid-liquid 
transition with well-separated relaxation and crystallization times, whereas 
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Figure 3 | Free-energy surface from unconstrained simulations. The p-Q, 
free-energy surface at 228.6 K and 2.4 kbar computed from 16 unconstrained MC 
simulations initialized in the low-Q, region. Contours are separated by 1kgT. 
Because of the separation of timescales between structural relaxation in the 
liquid phase and ice nucleation, each simulation was run for more than ~100 
relaxation times without exhibiting any sign of crystallization. The inset shows 
autocorrelation functions for density (blue line) and Q, (red line) computed from 
the unconstrained MC simulations performed in the LDL region. Fluctuations 
in density and structural order (Q,) decay in tandem on timescales that are 
relevant to relaxations within the liquid phase, as demonstrated by both order 
parameters having mean autocorrelation times of ~10° MC moves. 
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Figure 4 | Structural and topological order in the metastable coexisting 
liquids and in cubic ice. The free-energy surface at 228.6 and 2.4 kbar described 
by the first moment of the molecular local structure index distribution, I, and 
the crystalline order parameter, Q,. Contours are 1kgT apart. Parameter I 
successfully distinguishes the three phases based on structural order, 
characterizing the extent to which molecules in each phase possess a tetrahedral 
environment with well-separated first and second coordination shells. The inset 
shows that the three phases have distinctive topological features characterized 
by the first moment of the ring size distribution, R. 
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other models do not show this behaviour. The present results suggest 
that constraints associated with the breaking and forming of hydrogen 
bonds, present in ST2 (ref. 24) but not in coarse-grained models"’, have 
an important role. Further research using state-of-the-art free-energy 
methods, such as those employed here, can provide insights into this 
question and may thereby also improve our understanding of the phase 
behaviour of real water under deeply supercooled conditions. 


METHODS SUMMARY 


The reversible free-energy surface described by density, p, and the bond-orientational 
order parameter, Q,, was computed for the Ewald-compatible variant of the ST2 
water model** described by Liu et al.’ using MC simulations in the isothermal- 
isobaric ensemble, augmented with collective, N-particle rotational and translational 
MC moves and umbrella sampling”. A harmonic umbrella bias potential was used 
to restrict each MC simulation to a different window in p—Q¢ parameter space. Each 
simulation used to generate Fig. 1 was equilibrated for ~10*tq,, followed by a 
production phase of equal or greater duration, where tg, is the integrated autocor- 
relation time”? for Q, in the sampling window. Two-dimensional p -Q, histograms 
were generated from uncorrelated samples collected in each umbrella window. The 
histograms were subsequently combined” to produce an unbiased estimate of the 
free energy. Special care was taken to ensure reversibility in the low-density region 
(9 <0.98gcm *), enhancing sampling of degrees of freedom associated with 
structural order by performing Hamiltonian exchange MC moves”, in which 
umbrella restraint parameters were swapped between simulations in adjacent win- 
dows along Q,. Bi-directional sampling was also performed in this region, seeding 
two separate generations of simulations with initial configurations extracted from 
a freezing (LDL —= Ice Ic) or melting (Ice Ic— LDL) trajectory. Reversibility was expli- 
citly checked by comparing histograms from each generation of simulations to 
monitor for hysteresis (path dependence). Saved simulation trajectories were ana- 
lysed to examine the structural and topological properties of each phase identified in 
the free-energy surface. The final data sets were subjected to critical scrutiny and 
were found to be insensitive to the sampling methodology and duration, yielding 
estimates for the ice I-HDL free-energy difference and the HDL-LDL surface 
tension in harmony with independent simulations and thermodynamic expectations 
(see Extended Data). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

General sampling protocol. MC simulations in the isothermal-isobaric ensemble 
employing collective, N-particle smart MC moves” were used to investigate the low- 
temperature phase behaviour of the ST2 water model”, modified for compatibility 
with the Ewald treatment for long-range electrostatic interactions'””. The p-Q¢ 
range relevant to each phase under consideration was explored systematically using 
windowed umbrella sampling*’*’. The parameter space was divided into overlapping 
windows. Independent MC simulations were performed in each window, restricting 
sampling to the target region with a harmonic restraint: 


we) = [o(r®) -o°] + § [as(") - a) (1) 


2 

where r” is the vector describing the microscopic coordinates of the N-particle system, 
k, and kg, are spring constants, and parameters p* and Q; specify the window’s 
centre. Values ranging from 5,000kgT to 10,000kgT (cm® g’) and from 2,000kgT to 
6,000kgT for k, and kg,, respectively, proved sufficient to ensure that the simulations 
sampled in the vicinity of their target window. Technical details regarding the basic 
MC algorithm, implementation of the ST2 water model, and definition and calcula- 
tion of Q, are described in ref. 19. 

Free-energy analysis. Time series data were collected for p and Q, in each umbrella 
window during the post-equilibration, production phase of the MC simulations. The 
data were subsequently re-sampled with an interval equal to the maximum statistical 
inefficiency in each window, g=1+2 x max(t,,tq,), where t, and ta, are the 
integrated autocorrelation times associated with p and Q,, respectively. The relaxa- 
tion times for each observable were typically found to be comparable in magnitude 
(that is, t)~TQ,), including within sampling windows in the vicinity of the LDL 
basin. Two-dimensional p-Q, histograms were generated from the uncorrelated 
time series data and subsequently combined using the weighted histogram analysis 
method of Kumar et al.** to produce an unbiased estimate of the free energy, 
F(p,Qs) = —kgT In| @(p,Qg)], where (a is the microstate probability distribution. 
Points of liquid-liquid coexistence, where the LDL and HDL basins have equal 
depths, were located by reweighting in pressure”: 


F(p,Qo; p+Ap,T) = F(p,Qs; p.T) + ApN/p (2) 


where Ap is the pressure shift. Uncertainties in F(9,Q.p + Ap,T) were estimated 
from the variance computed from 500 resampled p-Q, free-energy surfaces gener- 
ated using the Bayesian bootstrap technique described by Hub et al.*°. This technique 
has been shown to provide robust error estimates even in extreme cases where the 
sampling duration is limited to timescales on the order of the characteristic relaxa- 
tion time of the biased observable**. 

Computing the three-phase diagram. Umbrella sampling MC simulations of 
192 ST2 water molecules at 228.6 K and 2.2 kbar were used to compute the rever- 
sible free-energy surface in Fig. 1. The high-density region (p = 0.98 gcm~*) was 
sampled by performing independent simulations in 27 density windows in the range 
0.98 gem * = p* =1.24gcm *usinga spacing of 0.01 gem” ° and Q* = 0.05. Simu- 
lations in the low-density region (p < 0.98 gcm 7 *) were carried out in four density 
windows, namely p* = 0.91, 0.93, 0.95 and 0.97 g cm >. Sampling along Qs was 
enhanced at each of the four target densities using Hamiltonian exchange MC 
moves”, in which attempts were made to swap parameters ka, and Qé between 
replicas in neighbouring Q, windows. Two independent sets of replicas were used for 
each value of p* in the low-density region. The first set comprised 16 replicas evenly 
distributed over the range 0.02 = Q¢ = 0.17, and 32 replicas were used in the second 
group to span the interval 0.16 = Qf = 0.625. Exchange attempts were made between 
even or odd numbered replica pairs with equal probability once every 200 MC moves 
on average. 

Simulations were equilibrated for ~10*zg, in each sampling window. Density, Qc 
and the configurational energy were carefully monitored for drift to verify that each 
simulation had completely equilibrated by the end of this period. Bi-directional 
sampling between the LDL and ice phase was also performed to serve as an addi- 
tional check for equilibration in the low-density region. The first generation of simu- 
lations was seeded using initial configurations extracted from a trajectory of the LDL 
freezing into ice Ic, and the second generation was initialized with configurations 
taken from a melting trajectory. The freezing and melting trajectories were produced 
by applying a strong umbrella bias to accelerate the phase transition process”. After 
equilibration, data collection was performed in each window for ~10*tg,. Histo- 
grams generated using data collected from the two generations of simulations in the 
low-density region were compared to explicitly check for reversibility. The absence 
of hysteresis confirmed that the simulations were properly equilibrated and sam- 
pling the reversible p-Q, free-energy surface. 

Finite-size scaling. Umbrella sampling calculations for N = 192, 300, 400 and 600 
ST2 water molecules were performed in the low-Q, region, using 35 evenly distrib- 
uted density windows in the range 0.90 gcm~* < p* < 1.24gcm *. Simulations in 
each window were equilibrated for at least 10°tg,, followed by a production phase of 
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similar duration. Comparison of the results for N = 192 with the more extensive calcu- 
lations used to generate Fig. 1 provided verification that the sampling duration and 
explored range of Q were sufficient to accurately reproduce the low-Q, portion of the 
free-energy surface. For each system size considered, the height of the barrier sepa- 
rating the liquids was computed at coexistence from the free energy profile along p: 


F(p)= ~kstin(| exp(—PF(9.Q6)]4) (3) 


where f = (kgT) 1. 

Unconstrained sampling. The p-Q, free-energy surface in Fig. 3 was computed 
by performing long, unbiased MC simulations of 192 ST2 water molecules at the 
estimated point of liquid-liquid coexistence (228.6 K and 2.4 kbar). Equilibrated HDL 
and LDL configurations extracted from umbrella sampling calculations were used 
to initialize eight independent simulations in the vicinity of each liquid basin. The 
unbiased simulations were run for two orders of magnitude longer than the inte- 
grated Q, autocorrelation time in the LDL region. Time series data collected over the 
duration of each MC simulation were analysed, as described above, to compute free 
energy. 

Analysis of local structure index and topological rings. The LSI’ is an order 
parameter sensitive to heterogeneity in water’s coordination shell capable of distin- 
guishing between molecular configurations characteristic of HDL, LDL and ice. The 
free-energy surface parameterized by the first moment of the molecular LSI distri- 
bution, I, and Q, was computed from time series data generated by analysing saved 
trajectories from the long umbrella sampling simulations used to construct Fig. 1. 
The uncorrelated data were subsequently re-weighted**”’ to remove the bias, gen- 
erating the final estimate of F(I,Q,) shown in Fig, 4. A 0.37-nm cutoff based on the 
O-O separation distance between neighbouring water molecules was used in the 
calculation of I. Additional details regarding the definition and calculation of I may 
be found in ref. 28. Ring statistics based on King’s criteria*” were computed from 
saved trajectories at selected points along the HDL-LDL and LDL-ice paths, apply- 
ing a 0.35-nm oxygen-based cutoff to determine topological connectivity between 
adjacent water molecules. 

Consistency among sampling methods. To verify that our results withstand crit- 
ical scrutiny, we have studied the dependence of the free-energy surface on sampling 
duration and methodology, computing the p—Qg free-energy surface at 228.6 K and 
2.4 kbar using several state-of-the-art computational techniques. Extended Data 
Table 1 lists the methods we have used, along with t, and tg, computed in the LDL 
basin, and the sampling duration in each umbrella window. Four and sixteen ident- 
ical simulation replicas were used in the well-tempered metadynamics*' and uncon- 
strained MC calculations, respectively, with each replica being run for the reported 
duration. 

The free-energy surfaces computed using the different sampling methods are shown 
in Extended Data Fig. 1. In each case we find two coexisting liquids separated by 
a ~4kpT free-energy barrier, demonstrating that such results are independent of 
sampling technique. Extended Data Fig. 1 also demonstrates that the results are 
devoid of non-equilibrium artefacts. Limmer and Chandler” have suggested that the 
LDL basin is such an artefact associated with the sluggish dynamics of ice coarsening, 
and consequently it was posited that the LDL basin should progressively age as the 
sampling duration increases, until it eventually vanishes at ~10°tq, (ref. 23). In con- 
trast, we do not observe significant changes even when the sampling duration is 
increased by two orders of magnitude from 10° to 10*rg,. As shown in Extended Data 
Fig. 1, the techniques that yield such satisfyingly consistent free-energy surfaces 
include the hybrid MC sampling method” employed by Limmer and Chandler”. 
Figure 1 shows that consistent results are obtained even when reversible sampling 
is performed between the LDL and ice Ic basins. Finally, our results are qualita- 
tively consistent with free-energy calculations employing different variants of the 
ST2 water model’*, microsecond-long MD trajectories exhibiting abrupt and infre- 
quent transitions between HDL and LDL’, and previous finite-size scaling studies”. 

Limmer and Chandler” have proposed a theory of artificial polyamorphism, which 
posits that a purported separation of timescales between density and structural rela- 
xations (that is, ta, >t») gives rise to an illusory LDL basin associated with the 
coarsening of ice. To scrutinize this prediction, we examined the density and Q, 
autocorrelation functions computed in the LDL region, using the various sampling 
techniques employed in our study. Extended Data Fig. 2 shows representative auto- 
correlation functions for three of the sampling techniques. Whereas the density and 
Q, autocorrelation functions exhibit transient behaviour at short times where they 
are separated by more than one order of magnitude, Extended Data Fig. 2 clearly 
shows that such short-time behaviour is sensitive to the sampling technique and 
therefore does not provide a physically meaningful description of the coupling between 
density and structural relaxations in the system. For each sampling technique, we find 
that density and Q, fluctuations decay in tandem at long times. It is this technique- 
independent, long-time behaviour that is relevant to sampling the physical properties 
of the system. Hence, our results demonstrate that Limmer and Chandler’s theory” 
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can only be justified if the long-time behaviour is completely neglected by defining 
the relaxation time, for instance, as C(t) =e |. Although this definition can in 
general be used to estimate 1, a more careful analysis is required when comparing 
correlation functions that are decoupled at shorter times but invariably decay 
together at long times. The physically relevant long-time behaviour may be captured 
by using a different metric such as the integrated autocorrelation time. Using this 
definition, we find that t,~tg, for each sampling method listed in Extended Data 
Table 1. Moreover, by re-sampling our data using an interval equal to the maximum 
statistical inefficiency in each window, g=1+2x max(t 3806 ) , we have excluded 
the possibility that transient, short-time correlations are embedded in the free- 
energy surfaces shown in Extended Data Fig. 1 and Figs 1 and 3. We also do not 
observe significant changes in the free-energy surface shown in Fig. 1 even when the 
data are re-sampled using an interval of 10°g. Consequently, the presence of a LDL 
basin cannot be attributed to finite-time artefacts associated with transient behaviour 
occurring on timescales that are orders of magnitude shorter than the sampling 
interval. 

Thermodynamic consistency. The free-energy surface in Fig. 1 shows that the 
coexisting liquids are metastable with respect to ice Ic at 228.6 K and 2.4 kbar, with 
the ice phase being lower in free energy by ~75kgT (in extensive units for N = 192) 
or, equivalently, AG;._, = —742] mol~ 1 In contrast, Limmer and Chandler’s”’ free 
energy calculations predict ice Ic-liquid coexistence at a nearby state condition 
(230 K and 2.6 kbar) for the same variant of the ST2 water model (see the middle 
column in Fig. 13 of ref. 23). To resolve this significant discrepancy, we have used 
thermodynamic integration (TI) along with an empirical equation of state (EEOS) 
parameterized to reproduce the experimental properties of water and ice’**’, to estim- 
ate AG;._;, under comparable state conditions for water (thus allowing us to subject 
both our results and those of Limmer and Chandler”’ to thermodynamic scrutiny); 
and to estimate the melting temperature for the ST2 model at 2.6 kbar (thus allowing 
us to test the very different predictions for the equilibrium melting temperature of 
ice Ic at 2.6 kbar in the ST2 model). 

Thermodynamic integration was performed using the identity 


AGic—1(P,T) =AGn—1(P,T) + AGic_m(P.T) (4) 


where subscripts Ic, Ih and L denote ice Ic, ice Ih and the liquid phase, respectively. 
Two levels of TI were considered for evaluating AGj,_1(P.T): 

(i) A simple linear extrapolation (LE) using experimental data for the specific 
volume (A Voin—L) and entropy (AS}\,, 1) change upon melting for ice Ih at 1 bar: 


AGn-L(P.T) =A Von 1(P—Pon) A801 (T—Ten)» 6) 


where Pj’, = L bar and Ty), is the melting temperature at 1 bar. 

(ii) The empirical equation of state (EEOS) developed in refs 44, 45, which is appli- 
cable over the ranges 0-22 kbar and 175-360 K and accurately describes the phase 
behaviour of liquid water and several ice polymorphs, including ice Ih. 

The difference in free energy between ices Ic and Ih, AG,,_yn, was calculated from 
experimental vapour pressure data for these ice phases*° and the enthalpy difference, 
AH;,.1y, measured by calorimetry*”. The ice phases were assumed to be incom- 
pressible, which is justified by the fact that their specific volumes are relatively insen- 
sitive to pressure****. Because the ST2 water model is over-structured in comparison 
with real water, it has a melting temperature Tee = 300 K for ice hat 1 bar (ref. 51), 
which is significantly higher than T?’,, for water**. Two different approaches were 
used to account for this behaviour: 

(i) A melting temperature of Tin = Te was assumed for ice Ih at 1 bar. 

(ii) Thermodynamic integration calculations were performed at the same super- 
cooling, AT* = Ty, — T; with respect to the melting temperature of ice Ih at 1 bar. 

Our simulations at 228.6 K, for example, were conducted at a supercooling of 
AT* = 71.4 K with respect to Te In the second approach, the TI was therefore 
performed from Tj), = 273.15 K to T= Ty, — AT“ = 201.75 K to achieve the 
same supercooling for real water. : 

Extended Data Table 2 shows the values of AG;._, predicted by LE and the EEOS 
for water, along with our AG;-_; calculation for the ST2 model at 228.6 K and 2.4 kbar 
obtained from Fig. 1. Although LE predicts the largest AG;._;, due to the assumption 
of incompressibility, it provides a reasonable order-of-magnitude estimate for this 
quantity. The more accurate EEOS, which accounts for the changes in the thermody- 
namic response functions of the liquid as a function of T and P, predicts that AG,,_; is 
smaller by a factor of 2 than the estimate obtained using LE. Because the ice phase 
produced by freezing LDL contains natural imperfections, the predicted AG;._;, for 
ST2 underestimates the difference in free energy that would be computed using an 
ideal ice Ic crystal prepared by artificial means. Defects in the crystal may also arise 
because the number of molecules in our simulations is not commensurate with a 
cubic surpercell of ice Ic. Despite such defects, however, we find that our AG;,_; value 
for the ST2 model is in reasonable agreement with the thermodynamic analysis, 
regardless of the approach used to compute or to assign the reference temperature in 


the equation-of-state calculations. In contrast, Extended Data Table 3 shows that 
Limmer and Chandler’s simulations”, purportedly for the same ST2 variant and at 
a nearby state condition (2.2 kbar and 230K), predict that AG;._; is an order of 
magnitude smaller than the values calculated by TI using LE and the EEOS. In fact, 
we find similar disagreement between the TI calculations and the AG,._; values 


estimated from the free-energy surfaces reported by Limmer and Chandler**”, even 
for the other ST2 variants considered in their studies. 
Limmer and Chandler”’ observed ice Ic-liquid coexistence (that is, AGS? 0) at 


230 Kand 2.6 kbar for the same variant of the ST2 water model examined in our study 
(see the middle column of Fig. 13 in ref. 23). Reweighting the free-energy surface 
shown in Fig. 1 in pressure and using the HDL as a reference, we find AG}!?, ~ 
—705J mol * at 228.6 K at 2.6 kbar. This value for AGS!?, was used along with LE 
and the EEOS to predict the melting temperature of ice Ic for the ST2 water model 
(Tt °"), providing an estimate of temperature at which our simulations should 
be performed to find ice Ic-liquid coexistence at 2.6 kbar. Starting from the initial 
temperatures (T}) listed in Extended Data Table 2, the LE and EEOS expressions for 
AG,.-., Were integrated at 2.6 kbar to find the temperature, T,, satisfying 


m,ST2 
Tre 


T (AAG ic— eAGS? 
| ( Gc ‘) dT | ( a) dT =705 J mol~! (6) 
T oT P 228.6K oT P 


We note that T, and T, are either defined with respect to ST2’s melting temperature 
for ice Ih at 1 bar (that is, peste ), or the supercooling, AT“, as described above. Thus, 
Tes”? ~T» for calculations performed using Toth = Tms!?, whereas Tre)? ~T> + 
ae - Th) for the latter scenario, where tae — Thin is the difference between 
the melting temperature of ice Ih at 1 bar for the ST2 model and real water. 

Extended Data Table 4 lists the estimates of T7"*”* at 2.6 kbar obtained using the 
same procedures and reference temperatures as those reported in Extended Data 
Table 2. The LE predicts peel =~ 260 K, whereas calculations with the more accur- 
ate EEOS estimate T™S! in the range 272-276 K at 2.6 kbar. To confirm these 
predictions, we computed Tose directly from simulation, using two different tech- 
niques. In the first approach, pee was determined using two-phase, ice Ic-liquid 
(N,Pz,T) MC simulations”, imposing a pressure of 2.6 kbar in the direction perpen- 
dicular to the ice Ic-liquid interface. Extended Data Fig. 3 shows the time evolution 
of the crystalline order parameter, Qe, for simulations performed at different tem- 
peratures near the qe value predicted by the EEOS. Below 270 K, the simulations 
exhibited a gradual drift towards higher values of Q¢, indicating that the system was 
freezing. Similarly, Q, decreased for simulations performed above 275 K because of 
the melting of ice. Our estimate of the melting temperature is therefore the average of 
these temperatures, T°!” ~ 273 + 3 K at 2.6 kbar, which is in excellent agreement 
with the range 272-276 K predicted using the EEOS. We also computed the p-Qg 
free-energy surface at 275 K and 2.2 kbar for N = 216 ST2 water molecules using the 
umbrella sampling procedure described above. Extended Data Fig. 4 shows the 
resulting p-Q, free-energy surface after reweighting in pressure using equation (2) 
to locate the point of ice Ic—liquid coexistence, 275 K and ~2.7 kbar. As Extended 
Data Table 4 shows, this result is in excellent agreement with our thermodynamic 
calculations using the EEOS and interfacial simulations. Such values are 30-46 K 
higher than the T;-° ‘ST? predicted by Limmer and Chandler at the same pressure”, 
demonstrating that those free-energy calculations are inconsistent with reasonable 
thermodynamic expectations based on accurate equations of state for real water and 
the established physical properties of the ST2 water model. 

We have shown that the free-energy surface shown in Fig. 1 is consistent with 
expectations based on thermodynamic arguments. This is demonstrated by the fact 
that our estimate of AG;._; for the ST2 model at 228.6 K and 2.4 kbar is in good agree- 
ment with calculations performed using the accurate EEOS for water. In addition, 
we have also demonstrated thermodynamic consistency by using the EEOS along 
with our AGS!?, value at 228.6 and 2.6 kbar to predict T™*’™* ~ 272-276 K. This 
prediction was verified by performing simulations of the ice Ic—liquid interface and 
umbrella sampling calculations. Such results demonstrate conclusively that T'S”? at 
~2.6 kbar is ~40-45 K higher than reported by Limmer and Chandler”. It therefore 
seems that their free-energy surface (middle column of Fig. 13 in ref. 23) is distorted 
to such an extent that the output of their simulations corresponds to an effectively 
higher temperature. To observe ice Ic-liquid coexistence at 2.6 kbar, as reported 
by Limmer and Chandler”’, this effective temperature would have to be well above 
the estimated liquid-liquid critical temperature (T, ~ 237 K for our model”) for any 
reasonable variant of the ST2 water model, explaining the absence of a LDL basin in 
their free-energy surfaces’. Because the two liquids are only separated by a ~4k,T 
barrier at 228.6 K and 2.4 kbar, the free-energy surface must be accurately computed 
to observe the LDL basin. At odds with this requirement, we find a ~70kgT dis- 
crepancy between our respective estimates for AG,.;, near 228.6 K and 2.6 kbar, 
which cannot simply be dismissed as non-equilibrium artefacts, as suggested by 
Limmer and Chandler’. Although the precise numerical origin of this discrepancy 
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is still under investigation, we showed above (see the section on Consistency among 
sampling methods, and Extended Data Fig. 1) that liquid-liquid coexistence is 
observed when we perform free-energy calculations using the hybrid MC technique” 
employed by Limmer and Chandler”’. In our hybrid MC implementation we use the 
molecular dynamics integrator of Miller et al.°*, whereas Limmer and Chandler” 
employed the constraint algorithm SETTLE” to simulate rigid ST2 water molecules. 
Although we have not yet implemented this integrator, Reinhardt et al.’ recently 
observed ‘catastrophic’ divergence from the well-established equation of state for 
the TIP4P/2005 water model when hybrid MC simulations were performed with 
SETTLE. A more comprehensive discussion of the different perspectives regarding 
the liquid-liquid phase transition in ST2 water, computational approaches and related 
studies has recently been published’***. 

As a final check, we followed the procedure described by Hunter and Reinhard 
to estimate the liquid-liquid surface tension, ),_1, from our finite-size scaling data. 
Wefind that), ~ 2 mJ m”, which is comparable to vapour-liquid surface tensions 
for various water models” at similar reduced temperatures near the vapour-liquid 
critical point (that is, yy_,, ~ 5.6-1.5 mJ m ? for T/T- ~ 0.95-0.98), and an order of 
magnitude smaller than the yy. ~ 23 mJ m” reported by Handel et al.*’ for the 
ice Ih-liquid surface tension in TIP4P. Thus, the small value of y,_1 is thermodyna- 
mically consistent with our observation that two liquids are forming an interface, not 
a liquid and a coarsening crystal, and with the fact that our simulations are per- 
formed relatively close to the estimated liquid-liquid critical point at a reduced 
temperature of T/T. ~ 0.96. 
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Extended Data Figure 1 | Reversible free-energy surfaces at 228.6 K and from hybrid MC, parallel tempering MC and Hamiltonian exchange MC 
2.4kbar computed using different sampling techniques. Surfacesonthetop simulations. The free-energy barrier separating the liquid basins is ~4kgT for 
row were computed using (from left to right) umbrella sampling MC, well- all of the surfaces shown. Contours are 1kgT apart and uncertainties are 
tempered metadynamics and unconstrained MC; the bottom row shows results _ estimated to be less than 0.5kgT. 
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Extended Data Figure 2 | Autocorrelation functions for different sampling calculated by averaging results from at least 12 independent simulations. 
techniques. Autocorrelation functions for density (blue) and Q¢ (red) Density and Q, fluctuations decay on very similar timescales, despite exhibiting 


computed in the LDL region using unconstrained MC (left), hybrid MC technique-dependent transient behaviour where these processes may be 
(centre) and Hamiltonian exchange MC (right). The correlation functions were separated by more than one order of magnitude. 
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Extended Data Figure 3 | Time evolution of the crystalline order parameter 
in two-phase MC simulations of the ice Ic-liquid interface at 2.6 kbar. The 
MC simulations were initiated from configurations containing 512 and 670 ST2 
water molecules in the ice Ic and liquid phases, respectively. The x and y 

dimensions of the simulation cells were fixed in accord with the lattice constant 
for ice Ic, which was determined at each temperature by performing a separate 


2000 


1200 
t (10* MC moves) 


1600 


calculation for the bulk ice phase, while the z dimension was allowed to 
fluctuate so as to impose a constant pressure of 2.6 kbar perpendicular to the 
ice-liquid interface. Drift of Qs towards higher or lower values indicates that the 
system is freezing or melting. The melting temperature of 273 + 3 K at 2.6 kbar 
was estimated by averaging the lowest and highest temperatures, respectively, 
at which melting and freezing were observed. 
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Extended Data Figure 4 | Reversible free-energy surface at 275 K and the melting temperature for ice Ic at 2.6 kbar obtained from TI calculations 
2.7 kbar demonstrating ice Ic-liquid coexistence. The liquid and iceIcbasins using the EEOS and the two-phase MC simulations of the ice-liquid interface. 
have equal depths with respect to the saddle point, indicating that the reported | Contours are 1kgT apart. 

state condition is a point of coexistence. Such results confirm the estimates of 
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Extended Data Table 1 | Sampling methods 


Method Tp» TQ, Sampling duration 
(MC moves) (ta, ) 

Umbrella sampling MC*™" 5 x 10°, 5 x 10° 10? 

Well-tempered Metadynamics**"* 5 x 10°, 5 x 10° 10? 

Unconstrained MC’ 10°, 10° 10? 

Hybrid MC“? + Umbrella Sampling? 10°, 104 5 x 10° 

Parallel tempering MC®’ + Umbrella Sampling *"! 10°, 10° ~104 

Hamiltonian exchange MC*. + Umbrella Sampling"! 10°, 10° ~104 


* Collective, smart MC?! moves used. 

+ Relaxation times estimated from unbiased simulations using the same types of MC moves. 

{Rigid body integrator of Miller et a/.°?; ~10 molecular dynamics integration steps per MC move. 

§ Eight replicas spaced between 228.6 and 272K. 

|| Bi-directional sampling performed between the LDL and crystal to ensure reversibility. 

State-of-the-art sampling methods used to perform free-energy analysis, along with integrated autocorrelation times for density and the crystalline order parameter Qg (t, and t9,, respectively) computed in the 
LDL basin at 228.6 K and 2.4 kbar, and the sampling duration in each umbrella sampling window given in terms of tg, . 
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Extended Data Table 2 | Comparison of ice Ic-liquid free-energy dif- 
ferences obtained from thermodynamic integration and from results 
presented in the text for the ST2 model 


Reference temperature AG LE AGEEOS AGsi? 
(J. mol) (J mol’) (J mol") 

iein=lon -1095 -505 -742 

ATS= 71.4K -1091 -604 -742 


Ice Ic-liquid free-energy differences (AGj--.) predicted by LE and the EEOS for water are in good 
agreement with the AG), value calculated for the ST2 model at 228.6 K and 2.4 kbar from the data 
presented in Fig. 1. The TI calculations using LE and the EEOS were performed using two different 
reference temperatures (described in Methods) to account for ST2’s over-structured nature in 
comparison with real water. 
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Extended Data Table 3 | Comparison of ice Ic-liquid free-energy dif- 
ferences obtained from thermodynamic integration and from results 
presented by Limmer and Chandler’? for the ST2 water model 


Reference temperature AG LE AGEEOS AGs\? : 
(J mol") (J mol") (J mot) 

en = THE -1077 -537 -66 

AT®°= 70.0K -1087 -636 -66 


* Estimated from Fig. 5(b) of ref. 23. 

Ice Ic-liquid free-energy differences (AGi,__) predicted by LE and the EEOS for water are in poor 
agreement with the AG,,__ value obtained by Limmer and Chandler’? for the ST2 model at 230 K and 
2.2 kbar. Such disagreement demonstrates that Limmer and Chandler's results do not withstand 
thermodynamic scrutiny and fail to provide a reasonable description of ST2’s phase behaviour. The Tl 
calculations using LE and the EEOS were performed using two different reference temperatures 
(described in Methods) to account for ST2’s over-structured nature in comparison with real water. 
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Extended Data Table 4 | Estimates of the melting temperature for 


ice Ic at 2.6 kbar for the ST2 water model 


Method 7m ST2 (k) 
WEE. Wine) 260 
TI(LE, ATS° ) 260 
TI(EEOS, Ton =To'in 276 
TI (EEOS, AT®° ) 272 
Interfacial ice Ic-liq. simulation (this work) 273 
Umbrella sampling (this work) 275 


Umbrella sampling (Limmer & Chandler”)' 230 


* Coexistence pressure is 2.7 kbar. 


+ Estimated from Fig. 13 (middle column) of ref. 23. 


Comparison of melting temperature estimates for ice Ic at 2.6 kbar for the ST2 water model calculated 
using the Tl schemes and empirical equations of state for water described in Methods. The estimates of 


m,ST2 
Te 


obtained from TI using the accurate EEOS of Choukroun and Grasset***® are in excellent 


agreement with values computed directly from two-phase MC simulations of the ice Ic—liquid interface 
and umbrella sampling MC simulations. In contrast, the Tose at 2.6 kbar estimated from Limmer and 
Chandler's? umbrella sampling simulations with the ST2 water model is lower by more than 40K, 
demonstrating severe thermodynamic inconsistencies with their free-energy calculations. 
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Possible control of subduction zone slow-earthquake 
periodicity by silica enrichment 


Pascal Audet! & Roland Biirgmann? 


Seismic and geodetic observations in subduction zone forearcs indi- 
cate that slow earthquakes, including episodic tremor and slip, recur 
at intervals of less than six months to more than two years’. In 
Cascadia, slow slip is segmented along strike’ and tremor data show 
a gradation from large, infrequent slip episodes to small, frequent 
slip events with increasing depth of the plate interface*. Observations” 
and models*” of slow slip and tremor require the presence of near- 
lithostatic pore-fluid pressures in slow-earthquake source regions; 
however, direct evidence of factors controlling the variability in 
recurrence times is elusive. Here we compile seismic data from 
subduction zone forearcs exhibiting recurring slow earthquakes and 
show that the average ratio of compressional (P)-wave velocity to shear 
(S)-wave velocity (Vp/ Vs) of the overlying forearc crust ranges between 
1.6 and 2.0 and is linearly related to the average recurrence time of slow 
earthquakes. In northern Cascadia, forearc Vp/Vs values decrease with 
increasing depth of the plate interface and with decreasing tremor- 
episode recurrence intervals. Low vp/ Vg values require a large addition 
of quartz in a mostly mafic forearc environment’®"’. We propose that 
silica enrichment varying from 5 per cent to 15 per cent by volume 
from slab-derived fluids and upward mineralization in quartz veins” 
can explain the range of observed vp/vg values as well as the downdip 
decrease in vp/Vs. The solubility of silica depends on temperature”’, 
and deposition prevails near the base of the forearc crust’’. We further 
propose that the strong temperature dependence of healing and perme- 
ability reduction in silica-rich fault gouge via dissolution-precipitation 
creep‘ can explain the reduction in tremor recurrence time with 
progressive silica enrichment. Lower gouge permeability at higher 
temperatures leads to faster fluid overpressure development and low 
effective fault-normal stress, and therefore shorter recurrence times. 
Our results also agree with numerical models of slip stabilization under 
fault zone dilatancy strengthening”* caused by decreasing fluid pres- 
sure as pore space increases. This implies that temperature-dependent 
silica deposition, permeability reduction and fluid overpressure devel- 
opment control dilatancy and slow-earthquake behaviour. 

Slow earthquakes—comprising slow fault slip, often accompanied 
by low-frequency tremor (also called episodic tremor and slip, or ETS)— 
generally recur at regular intervals on the plate interface within the 
forearc of young and warm subduction zones, downdip of the locked 
zone”. Their association with a dipping layer of extremely low seismic 
S-wave velocity, interpreted to represent near-lithostatic pore-fluid 
pressure within subducting oceanic crust, has been established in most 
locations”, thus suggesting a link between fault zone hydrology and 
the fault-slip behaviour. Factors controlling ETS periodicity are poorly 
constrained. One possibility involves modulation by periodic external 
forces including seasonal hydrologic loads and the Earth’s 14-month 
pole tides’*'*, but these cannot explain the wide range of observed 
periods. Segmentation of ETS behaviour in Cascadia correlates qua- 
litatively with the overriding forearc structure and geology’; however, 
the exact nature of this relation is elusive. In northern Cascadia, tre- 
mor observations indicate a systematic decrease in the recurrence time 
of slow-slip events with increasing depth of the plate interface*. Wech 


and Creager’s* conceptual interpretation of this observation involves a 
decrease in friction with increasing temperature, resulting in a weaker 
fault that ruptures more frequently at greater depths. 

We compile observations of converted teleseismic waves (or receiver 
functions) from permanent and temporary broadband stations located 
in the forearc of circum-Pacific subduction zones where slow earthquakes 
(slow-slip with or without tremor) are known to occur at regular inter- 
vals, including Japan, Cascadia, Mexico, Costa Rica, and New Zealand 
(Fig. la, Extended Data Table 1). At each subduction zone we select sta- 
tions closest to the inferred slow-earthquake source regions (Extended 
Data Fig. 1). In Cascadia we consider stations closer to the longer- 
recurrence ETS events at shallower depths*’’. These data are sensitive 
to structures with scale lengths of 1-10 km and are dominated by the 
signature of a dipping, low-velocity layer*’’. The low-velocity zones asso- 
ciated with the slow-earthquake slip areas have very high vp/vs values 
of 2.6 + 0.3 (1a), which have no apparent relationship with the slow- 
slip recurrence intervals (Fig. 1b). On the other hand, we see a linear 
and positive relation between the vp/vs of the overriding forearc crust 
and recurrence times of slow earthquakes (Fig. 1c). The forearc vp/vs 
results are in agreement with a number of studies that estimate forearc 
crust vp/vs using different forms of travel time tomography (Fig. 1c; 
see Methods), which generally show somewhat lower vp/vs owing to 
various forms of data regularization and smoothing. 

Following previous studies, we interpret the high vp/vs values of the 
low-velocity layer to represent high, near-lithostatic pore-fluid pressure 
within subducting upper oceanic crust*’. Elevated pore-fluid pressures 
imply that the plate interface represents a low-permeability boundary, 
presumably caused by mineral precipitation or grain size reduction. The 
ubiquity of overpressured oceanic crust in slow-earthquake source re- 
gions of warm subduction zones suggests that low effective stress on 
the megathrust is a necessary condition for the occurrence of slow earth- 
quakes. The observed scatter in the low-velocity-zone vp/vs values and 
the absence of a relationship between vp/vs and recurrence times may 
indicate that the measured vp/vs values are only a snapshot of more 
dynamic and possibly fast-changing fluid processes within the oceanic 
crust. Exploring temporal variations in vp/vs may capture such processes, 
should these be resolvable. The seismic velocities of the overlying forearc 
crust seem to provide better constraints on the time-integrated effects of 
fluid flow and accumulation and the associated transport and precip- 
itation of silica’, which apparently correlate with the slow-slip behaviour. 

The linear relationship between forearc crust vp/vs and the recur- 
rence times of slow earthquakes (Fig. 1c) supports the hypothesis that 
the structure of the hanging wall of subduction zone forearcs reflects 
conditions that determine ETS behaviour*”’. We further examine this 
question by compiling seismic observations of hanging-wall vp/vs along 
a margin-perpendicular profile in northern Cascadia from published 
data’’. Values of vp/vs progressively decrease from initially high (> 1.85) 
to low (~1.65) values with increasing depth (from 20 km to 45 km) to 
the plate interface, indicating that progressively more overlying crust 
material with low bulk vp/vg is sampled (Fig. 2). Laboratory measure- 
ments of vp/Vs for most crustal rocks at dry conditions fall in the range 
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Figure 1 | Subduction zone velocity structure in slow-earthquake source 
regions. Recurrence times of slow earthquakes (selected regions shown in a; see 
Methods and Extended Data Table 2 for data sources) compared with vp/vs 
values of a dipping, low-velocity zone interpreted as subducting oceanic 
crust (b) and overriding forearc crust (c) (Extended Data Table 1 and Fig. 1). 
Error bars show lo uncertainty. High vp/vs values of subducting oceanic 
crust show a large scatter (lo zone shown in grey) and are uncorrelated with 


of 1.7 to 1.85, with higher values found for more mafic compositions’”. 
These ratios increase further at wet and slightly overpressured condi- 
tions”, and the highest vp/vs values observed in the updip portion of 
the Cascadia profile may be explained by the presence of hydrated 
mafic lithologies and low-grade metamorphic facies. The lowest values 
require increasing proportions of silica-rich minerals and, in particu- 
lar, quartz*’, characterized by the lowest velocity ratios (vp/vs = 1.50 + 
0.03) determined in the laboratory at ambient temperatures’°”*, which 
decrease further with increasing temperatures up to the a-to-B phase 
transition (at >600 °C) of quartz”*. 

Quartz-rich rocks are not typically found at those depths in the lower 
continental crust. Quartz enrichment may be caused by the precipi- 
tation of fluid-dissolved silica derived from the progressive dehydra- 
tion of the downgoing slab. This scenario is supported by seismic", 
laboratory’ and field evidence” that suggest quartz deposition to be 
progressively more important downdip owing to the temperature depen- 
dence of silica solubility in slab-derived fluids’®. Our data support a 
massive addition of silica to the deep continental forearc crust, which 
may locally reach 20% quartz by volume”. Fossil examples of abnor- 
mally high concentration of quartz veins include giant mesothermal 
gold deposits that formed during greenschist facies metamorphism in 
accretionary complexes”. Although the estimated fluid flux required 
is about two orders of magnitude greater than fluid production rates 
estimated from slab dehydration processes in Cascadia” (see Methods), 
the availability of silica-saturated fluids from the slab may be greatly 
enhanced by complete serpentinization of the mantle near the wedge 
corner’. Silica enrichment as a function of subduction zone age, tem- 
perature and plate interface depth can explain the observed pattern of 
vp/Vs variations within the overlying crust. 

What causes the observed strong correlation between silica enrich- 
ment indicated by the forearc vp/vs data and decreasing slow-earthquake 
recurrence times? We postulate that abundant silica-rich fluids from 
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Forearc crust v,/Vg 


recurrence times. In c the coefficient of correlation is 0.9 and a two-tailed 
Student’s t-test shows a statistically significant correlation at the 95% level. Also 
shown in shaded colours are ranges of vp/vg estimates from various seismic 
tomography studies (Extended Data Table 2). Grey circles show along-dip 
variations in forearc vp/vs values’! and tremor periodicity for northern 
Cascadia‘ (Fig. 2). Labels are SI, Siletzia, WR, Wrangellia; KL, Klamath; SR, 
Southern Ryukyu; WS, Western Shikoku; and EK, Eastern Kii peninsula. 


the slab and increased temperatures accelerate the rate of permeability 
reduction in the fault zone, which plays a fundamental part in con- 
trolling fault strength and stability. At the pressure and temperature 
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Figure 2 | Downdip vp/Vg variations of overlying forearc crust in northern 
Cascadia. Seismic data (yellow squares; error bars show lo uncertainty) are 
from ref. 21. The linear trend of vp/vs as a function of plate interface depth”’ is 
used to match the position of the logarithmic decay (solid red line) of slip 
periodicity along dip*. The blue contours show the probability density function 
at constant intervals (0.01) for the range of linear regressions obtained from 
bootstrap analysis of the trend, with dark blue being more probable than 
pale blue. The shaded grey region outlines the range of linear regressions 
from the bootstrapped samples (see Methods). Horizontal dashed grey lines 
show vp/vg and slip periodicity values extracted at data points that were used 
to fit the exponential trend of tremor periodicity‘, plotted in Fig. 1. The inset 
shows the location of the vp/vs profile” (yellow box) and the area used for 
inferring decay of tremor periodicity* (red shaded area). 
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Figure 3 | Conceptual model of silica enrichment controlling slow- 
earthquake behaviour in northern Cascadia. Progressive silica deposition 
into quartz minerals and veins due to higher silica solubility with increasing 
temperature T at depth enriches the overlying forearc crust and decreases 
Vp/Vg values. Following a slow-slip event, permeability k increases, 

allowing fluid circulation and a reduction in pore-fluid pressure. Strong 


conditions corresponding to tremor depths, rapid dissolution-precipitation 
creep processes in a semi-ductile shear zone control permeability reduc- 
tion in quartz gouge™. Slow fault slip produces transient permeability 
increases in fault gouge, followed by resealing’*”””*. Faster healing and 
reduction in permeability may lead to faster recharge and overpressure 
development, and thus more rapid reduction in the effective fault- 
normal stress and more frequent slip events (Fig. 3). Assuming a con- 
stant stressing rate along dip, the apparent decrease in slip size with 
depth depends on stress drop’. Our model therefore suggests that stress 
drop is smaller when permeability reduction and pore-fluid pressure 
buildup increase more rapidly between slow-slip events. Our model is 
also consistent with the downdip initiation and updip migration ofboth 
large and small slow-slip events*, where the updip segment of the plate 
interface does not always react to a downdip slip pulse because it re- 
mains stable at those timescales (that is, fluid overpressure and effec- 
tive stress have not yet reached the critical threshold for slip). However, 
it is likely that slip pulses initiating downdip may bring the system to 
instability when updip conditions are close to critical. 

This model can explain the relation between silica enrichment in the 
forearc crust and slow-earthquake recurrence time as a function of plate 
interface depth in northern Cascadia (Fig. 3). The control of silica gouge 
on recurrence time may also be applicable to the global (Fig. 1c) and 
regional? observations, if the absolute amount of silica enrichment is a 
proxy for conditions (temperature, fluids and fault zone mineralogy) 
that govern creep processes and healing rate in the fault zone. In Cascadia, 
ETS recurrence-time segmentation correlates with the geology of the 
forearc crust, where the younger, more mafic (silica-poor) Siletzia ter- 
rane has the highest vp/v, ratios and longest recurrence times compared 
with the older, more felsic (silica-rich) forearc terranes to the north and 
south with shorter recurrence times’. These observations are qualita- 
tively consistent with a control of ETS behaviour by time-integrated 
and temperature-dependent silica enrichment of the forearc crust. 

Numerical simulations of megathrust fault slip suggest that dila- 
tancy strengthening plays an important role in controlling slow-slip 
behaviour”. In areas of strong dilatancy, pore opening is faster than 


temperature-dependence of permeability reduction in quartz gouge leads to 
faster re-sealing, overpressure development (shown here as decreasing effective 
normal stress, ¢,) and thus lower recurrence times ¢, with increasing 
temperature. Thin dashed lines are 200 °C isotherms. Moho, Mohorovicic 
discontinuity. This figure is modified from ref. 22, with permission. 


pore-fluid diffusion during shear, which stabilizes slip and prevents the 
development of a seismic rupture. In these simulations, dilatancy also 
modulates the periodicity of slow-slip events, with increasing dilatancy 
leading to an increase of recurrence time, slip amplitude and duration 
of slow-slip events'*. These models suggest that an updip increase of 
dilatancy in the slow-slip zone produces less frequent, slower-slipping 
ETS in the updip part and more frequent, faster-slipping short-term 
slip events at greater depths. However, it is not clear through which pro- 
cess changes in forearc vp/v, would produce, or be the result of, cor- 
responding changes in dilatancy that lead to the observed linear relation 
with recurrence time. We speculate that temperature-dependent silica 
deposition, permeability reduction and overpressure development con- 
trol dilatancy and thus slow-earthquake behaviour. The lower edge of 
the episodic slip zone presumably marks a transition to fully ductile 
flow controlled by dislocation creep, in which fluids act as a weakening 
factor but pore pressure and dilatancy effects are no longer important. 


METHODS SUMMARY 


Data used in this study come from several broadband seismic stations located in 
subduction zone forearcs that exhibit ETS (Extended Data Fig. 1). At each station 
we compile three-component data with high signal-to-noise ratio (>7.5 dB) on the 
vertical component from 1990-2011, surface-wave magnitude M > 5.8 earthquakes. 
Seismograms are decomposed into upgoing P- and S-wave modes and are decon- 
volved using Wiener spectral deconvolution. Receiver functions are filtered at corner 
frequencies of 0.05 Hz to 0.5 Hzand stacked into 7.5° back-azimuth and 0.002 skm~! 
slowness bins. We model waveforms using a fast ray-based forward algorithm”. 
We usea two-layer crustal model with fixed P-wave velocity of 6.5 kms ' composed 
of continental forearc crust overlying a dipping low-velocity layer representing sub- 
ducting oceanic crust, underlain by a mantle half-space with fixed P- and S-wave 
velocities of 8.0kms~' and 4.5kms_', respectively. Strike and dip of the low- 
velocity layer are taken from global and local slab models*’. The misfit is calculated 
using a normalized correlation scheme and includes both radial and transverse 
components. Cumulative variance within each receiver function bin is used as an 
inverse weight in the misfit calculation and the Monte Carlo inversion for model 
parameters is carried out using a Neighbourhood Algorithm’. Results for Cascadia 
are taken directly from refs 19 and 21. In Costa Rica we extract a subset of data from 
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the southeastern Nicoya peninsula”’. Extended Data Table 1 shows all other mea- 
surements for Japan, Mexico and northern New Zealand. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Receiver function analysis. Data used in this study come from several permanent 
and portable networks of broadband seismic stations located in the forearc of sub- 
duction zones that exhibit ETS (Extended Data Fig. 1). At each station we compile 
all available three-component data with high signal-to-noise ratio (>7.5 dB) on the 
vertical component from 1990-2011, magnitude M > 5.8 earthquakes at teleseis- 
mic distances. Seismograms are decomposed into upgoing P- and S-wave modes 
and are deconvolved using Wiener spectral deconvolution to obtain radial and trans- 
verse P-wave receiver functions. Receiver functions are then filtered at corner fre- 
quencies of 0.05 Hz to 0.5 Hz and stacked into 7.5° back-azimuth and 0.002 skm™! 
slowness bins. Coherent signals on the receiver functions represent direct P-to-S 
conversions and free-surface P-to-S and S-to-S reverberations from velocity con- 
trasts within the underlying column’. Each scattered phase displays oppositely 
polarized pulses that are characteristic of a prominent, dipping low-velocity layer. 
Waveform modelling. We model waveforms using a fast ray-based forward algo- 
rithm for waves in dipping, anisotropic media”. We use a two-layer crustal model 
with fixed P-wave velocity of 6.5 kms’ composed of continental forearc crust over- 
lying a dipping low-velocity layer representing subducting oceanic crust, underlain 
by a mantle half-space with fixed P-wave and S-wave velocities of 8.0kms~' and 
4.5kms_', respectively. Parameters that we estimate are the thickness and vp/vs of 
each crustal layer. Our results are only weakly sensitive to variations in the back- 
ground P-wave velocity structure. Strike and dip of the low-velocity layer are taken 
from global and local slab models**. The misfit is calculated using a normalized 
correlation scheme and includes both radial and transverse components. Cumula- 
tive variance within each receiver function bin is used as an inverse weight in the 
misfit calculation and the Monte Carlo inversion for model parameters is carried 
out using a Neighbourhood Algorithm’’. Results for Cascadia are taken directly 
from refs 19 and 21. In Costa Rica we extract a subset of data from the southeastern 
Nicoya peninsula, where large slow slip occurs”’. Extended Data Table 1 shows all 
other measurements for Japan, Mexico and northern New Zealand. 

Fluid flux from quartz deposition. Our data suggest a massive addition of silica 
to the deep continental forearc crust, possibly reaching 20% quartz by volume 
locally''. Considering the time-integrated flux of 4.5 X 10° m? of fluids per m? of 
rock needed to precipitate quartz by regionally flowing fluids from an average local- 
equilibrium silica-solubility gradient at 350 °C-450 °C and 6-8 kbar (ref. 12), a 20% 
silica enrichment requires a steady-state fluid flux of about 20 mm yr | over the 40 
million years of Cascadia subduction". This flux is about two orders of magnitude 
higher than fluid production rates estimated from slab dehydration processes”. 
However, the local fluid production at the base of the crust near the mantle wedge 
corner may be greatly increased because complete retrograde serpentinization occurs 
early owing to the small wedge volume”. Silica-rich fluids are thus no longer consumed 
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by further serpentinization, which may significantly increase fluid fluxes near the 
bottom of the forearc crust. 

Bootstrap analysis. We performed a bootstrap analysis of the trend between forearc 
vp/Vvs and plate interface depth. For this analysis we extracted 10,000 random sets of 
samples (with replacement) from the original data and calculated both the coef- 
ficient of correlation p and the coefficient of determination r’ froma linear regres- 
sion of each set. Median values are 0.8 and 0.6 for p and r, respectively, indicating a 
reasonably good fit. Finally, we determined the probability density function of the 
bootstrapped regression lines as a function of plate interface depth. The result is a 
two-dimensional map of the probability density function based on the data, shown 
in Fig. 2 as contours of constant probability density function values (0.01). The 
range of regression lines is also plotted in Fig. 2. 

Data sources. In Fig. 1 we compare vp/Vg estimates with recurrence times of slow 
earthquakes for five different subduction zones. These include Cascadia, Costa Rica, 
Mexico, southwest Japan (Nankai, Ryukyu) and Hikurangi, New Zealand. Data 
sources for the recurrence times are listed in Extended Data Table 2. We note that 
Vergnolle et al.** suggest that only the largest ETS at Guerrero recurring every four 
years are well documented. We also compile vp/vs estimates from various forms of 
seismic tomography models for each subduction zone (Extended Data Table 2). 
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Extended Data Figure 1 | Examples of receiver functions and inversion 
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are in yellow, contours of slow-slip patches from ref. 2 are in light green, 
contours and epicentres of tremors from ref. 2 are in purple, and station 


locations used in this study are shown as inverted red triangles. For a subset of 
stations (PLAY, PXZ and IGK, identified by the blue squares on the maps) we 
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show the observed (top, A) and modelled (bottom, B) radial receiver functions 
ordered by back-azimuth and, for each back-azimuth, by slowness of the 
incoming P wave. ¢, A slice through model misfits, with warm colours 
indicating low values, showing the vp/vs of forearc crust versus the vp/vs of the 
low-velocity zone. The star shows the minimum value of the misfit plot 


(best-fitting value). 
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Extended Data Table 1 | vp/vs results from the inversion of receiver function data 


Station Longitude Latitude ForearcVp/Vs_ _—_Low-velocity zone Vp/Vs 


PLAY -99.67 17.12 1.78 + 0.05 2.5+0.6 
XALT -99.71 17.10 1.70 + 0.01 2.7 40.2 
XOLA -99.62 17.16 1.69 + 0.01 2.1+0.3 
TICO -99.54 17.17 1.66 + 0.02 2.7+0.9 
Mexico 

CARR -99.51 17.21 1.71 + 0.014 2.5+0.4 
RIVI -99.49 17.29 1.74 + 0.01 2.8 +0.7 
ACAH -99.47 17.36 1.69 + 0.01 2.1+0.8 
MAZA -99.46 17.44 1.77 + 0.01 2.9+0.6 
TSA 132.82 33.18 1.59 + 0.02 2.2+0.8 
SW Japan WTR 136.58 34.37 1.62 + 0.05 2.6 + 0.2 
IGK 124.18 24.41 1.64 + 0.01 2.8+0.9 
PXZ 176.86 -40.03 1.98 + 0.06 2.8+0.5 

Hikurangi 
KNZ 177.67 -39.02 2.02 + 0.03 2.5+0.7 


SW, Southwest. 
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Extended Data Table 2 | Data sources for ETS recurrence times and seismic velocity models 


Subduction Zone — ETS recurrence time Forearc crust Vp/Vs 
Cascadia Brudzinski & Allen” Ramachandran & Hyndman"' 
Costa Rica Jiang et al.°* DeShon et al.”” 

Central Mexico Lowry’” Huesca-Perez & Husker” 


5 9 


Southwest Japan Heki & Kataoka’; Obara’’ Matsubara, Obara & Kasahara’ 


Hikurangi Wallace & Beavan™ Reyners et al.” 


Data are from refs 3, 11, 17, 32, 33, 34, 35, 37, 38, 39 and 40. 
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A unique property of many adult stem cells is their ability to exist in 
anon-cycling, quiescent state’. Although quiescence serves an essen- 
tial role in preserving stem cell function until the stem cell is needed 
in tissue homeostasis or repair, defects in quiescence can lead to an 
impairment in tissue function’. The extent to which stem cells can 
regulate quiescence is unknown. Here we show that the stem cell qui- 
escent state is composed of two distinct functional phases, Gp and an 
‘alert’ phase we term Gajert- Stem cells actively and reversibly trans- 
ition between these phases in response to injury-induced systemic 
signals. Using genetic mouse models specific to muscle stem cells (or 
satellite cells), we show that mTORCI1 activity is necessary and suf- 
ficient for the transition of satellite cells from Go into Gaj_,, and that 
signalling through the HGF receptor cMet is also necessary. We also 
identify Go-to-Gajert transitions in several populations of quiescent 
stem cells. Quiescent stem cells that transition into Gaj.,, possess 
enhanced tissue regenerative function. We propose that the transi- 
tion of quiescent stem cells into Ga), functions as an ‘alerting’ mech- 
anism, an adaptive response that positions stem cells to respond rapidly 
under conditions of injury and stress, priming them for cell cycle entry. 

Adult stem cells have been presumed to exist in one of two states: (1) the 
quiescent state in which the cell is not actively cycling and (2) the acti- 
vated state where the cell has committed to or is in the cell cycle**. In 
contrast to the cell cycle, which can be sub-divided into distinct phases, 
quiescence is not as well characterized. Emerging data suggest that stem 
cells can regulate quiescent functional properties*®. Studying the regu- 
lation of the transition of satellite cells (SCs) from the quiescent to the 
activated state, we made a curious observation—SCs in a muscle con- 
tralateral to the muscle in which we induced an injury (contralateral 
satellite cells, CSCs) responded to that distant injury and had cycling 
properties that were different from those in a non-injured animal (qui- 
escent satellite cells, QSCs) and from the injured tissue (activated sat- 
ellite cells, ASCs) (Fig. 1a). Using the Pax7°rER driver and Rosa26"**” 
lineage tracer to specifically label SCs”* (Extended Data Fig. 1a), we 
found that these CSCs showed markedly increased, but overall still low, 
propensity to cycle when compared to QSCs, as measured by BrdU 
(5-bromodeoxyuridine) incorporation in vivo (Fig. 1b). Upon isolation 
and culturing ex vivo, CSCs displayed accelerated cell cycle entry as 
measured by EdU incorporation and time required to complete the first 
cell division compared to QSCs (Fig. 1c, d). Subsequent cell divisions of 
progeny of CSCs and QSCs occurred at similar rates to those of ASCs 
(Extended Data Fig. 1b). This functional response was not limited to SCs 
in muscle groups directly contralateral to the injury or to the agent of 
muscle injury (Extended Data Fig. 1c-e). 

One of the most obvious changes in ASCs is a dramatic increase in 
cell size relative to QSCs (Fig. 2a). We found that CSCs displayed a very 
slight, but significant, increase in cell size relative to QSCs (Fig. 2a, b 
and Extended Data Fig. 2a, b). Similarly, we also observed that CSCs had 
stronger EYFP intensity from the Rosa26"*"” reporter, elevated levels of 
pyronin Y staining, and increased incorporation of the ribonucleotide 


EU compared to QSCs (Extended Data Fig. 2c-e), which suggests increased 
transcriptional activity. Principle component analysis (PCA) of the tran- 
scriptional profiles of QSCs, CSCS and ASCs showed that CSCs fall between 
QSCs and ASCs along the first component axis (PC1) (Fig. 2c). Tran- 
scriptionally, CSCs were highly correlated with both QSCs and ASCs, 
more strongly than QSCs and ASCs were correlated (Fig. 2c), which also 
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Figure 1 | Satellite cells distant from the site of injury have different cell 
cycle kinetics than quiescent and activated satellite cells. a, Schematic 
representation of the location of QSCs, CSCs and ASCs in relation to muscle 
injury. b, CSCs have greater propensity to cycle in vivo than do QSCs (n= 3; 
significance is versus QSCs). ¢, A higher percentage of CSCs incorporate 
EdU (5-ethynyl-2'-deoxyuridine) after 40h than QSCs. Data from a 
representative experiment are presented (n = 2; significance is versus QSCs). 
d, CSCs require less time to compete the first division (n = 3). Details on data 
presentation and sample size can be found in the Methods Summary and full 
Methods sections. DPI, days post-injury. 
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Figure 2 | Satellite cells that are distant from an injury become ‘alert’. 

a, Representative images of QSCs, CSCs and ASCs immediately after isolation. 
b, CSCs are larger than QSCs (n = 3). ¢, CSCs have a transcriptional profile 
that is intermediate between QSCs and ASCs (along PC1) as shown by PCA and 
Pearson’s r values (n = 3). d, Increased mitochondrial activity in CSCs 
compared to QSCs (representative FACS plot, n = 4). Unst, unstained. 

e, CSCs have increased mtDNA content relative to QSCs (n = 3), measured 
by qRT-PCR and normalized to genomic DNA (gDNA). f, CSCs have 

more intracellular ATP than QSCs (n = 4). g, Immunofluorescence 
immunohistochemistry (IF-IHC) staining of tibialis anterior (TA) muscle 
showing representative pS6~ and pS6* SCs. h, Quantification of IF-IHC 
staining for pS6 in SCs (n = 3; significance is versus non-injured). 
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suggests that CSCs are intermediate between QSCs and ASCs. How- 
ever, detailed immunocytochemistry analysis immediately after isolation 
showed that CSCs are phenotypically more similar to QSCs (Extended 
Data Fig. 2f-i). To test if CSCs represent a population of stem cells or a 
population of committed progenitor cells, we performed transplanta- 
tion and pulse-chase experiments and found no difference in the engraft- 
ment efficiency and capacity for self-renewal between CSCs and QSCs 
(Extended Data Fig. 2), k). Together, these data suggest that CSCs are 
similar to, but distinct from, QSCs and possess the stem cell charac- 
teristics of QSCs. 

To gain further insight into what distinguishes CSCs from QSCs, we 
analysed the molecular pathways enriched in genes induced in the CSC 
transcriptome relative to the QSC expression profile. We found that two 
annotation groups were significantly enriched in genes upregulated in 
CSCs relative to QSCs: cell cycle and mitochondrial metabolism (Extended 
Data Fig. 3a). To further investigate mitochondrial metabolism in CSCs, 
we performed MitoTracker Deep Red staining and measured mtDNA 
content and found that, relative to QSCs, CSCs displayed evidence of 
elevated mitochondrial activity (Fig. 2d, e). Consistent with these findings, 
and keeping with the increase in cell size, we also found that CSCs have 
increased levels of cellular ATP (Fig. 2f and Extended Data Fig. 3b-d). 
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Collectively these data describe a set of properties that distinguishes 
CSCs from QSCs and ASCs: kinetics of cell cycle entry, propensity to 
cycle, cell size, transcriptional activity and mitochondrial metabolism. 
Importantly, CSCs, like QSCs, are still quiescent in that, as a population, 
almost all CSCs are not actively cycling. Because the injury-induced 
phenotype of CSCs is intermediate between QSCs and ASCs, we refer 
to CSCs as ‘alert’ SCs and the set of properties that distinguishes these 
cells as the ‘alert’ phenotype. The characteristics of this alert pheno- 
type described above have a common thread in that they have all been 
previously linked, in other systems, to the mTORCI signalling pathway 
(reviewed in ref. 9). For example, we observed induction of phospho-S6 
(pS6), a surrogate of mTORC]1 activity, in alert SCs (Fig. 2g, h and Extended 
Data Fig. 3e-g). Furthermore, we found that by sorting SCs for prop- 
erties of the alert state (Extended Data Fig. 3h), we enriched for a pop- 
ulation of pS6* SCs that also possessed the other attributes of the alert 
state (elevated propensity to cycle and reduced time to first division) 
(Extended Data Fig. 3i-m). Together these data show that there is a strong 
correlation between activation of mTORCI signalling and the alert phe- 
notype in SCs. 

To test if any aspects of the alert response were directly regulated by 
mTORCI signalling, we used the Pax7"* driver to specifically ablate 
TSC1, an inhibitor of mTORC1 signalling, in SCs. As a genetic model 
of mTORC1 activation”®, TSC1 knockout (KO) QSCs displayed induc- 
tion of mTORC1 activity (Extended Data Fig. 4a, b). TSC1 KO QSCs 
also displayed all aspects of the alert phenotype in an otherwise non- 
injury context: increased propensity to cycle, accelerated cell cycle entry, 
increased MitoTracker Deep Red staining and increased cell size (Fig. 3a—c 
and Extended Data Fig. 4c). To test whether the alert response requires 
mTORCI1, we used a conditional allele of Raptor (Rptor)", an essential 
component of the mTORCI signalling complex, with the Pax7*"* driver 
to specifically ablate Rptor expression and suppress mTORC1 signal- 
ling in SCs (Extended Data Figs 4b and 5a-c). Overall, we found that 
Rptor KO SCs contralateral to a muscle injury were completely unre- 
sponsive to the injury and did not manifest any of the injury-induced 
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Figure 3 | Activation of mTORC1 is necessary and sufficient for the alert 
phenotype. a—c, TSC1 KO QSCs display characteristics of alert SCs: increased 
propensity to cycle in vivo (a, n= 6); reduced time to first division (b, n = 3); 
increased mitochondrial activity (c, representative FACS plot n = 3). 

d-f, Rptor KO suppresses induction of the alert state. Rptor KO CSCs show no 
differences in: propensity to cycle in vivo (d, n = 6); time to first division 

(e, n = 3); and mitochondrial activity (f, representative FACS plot, n = 3). 
g-i, cMet KO CSCs show no injury-induced regulation of: propensity to cycle 
in vivo (g, n = 4); time to first division (h, n = 3); and mitochondrial activity 
(i, representative FACS plot, n = 3). 
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changes that alert wild-type CSCs display (Fig. 3d-f and Extended 
Data Fig 5d, e). These data combined show that mTORC1 signalling 
in SCs is necessary and sufficient for the alert response. 

Next, we focused on the signals upstream of mTORC1 which initiate 
the alert response and which are regulated by injury. Latent hepatocyte 
growth factor (HGF) is found in the extracellular matrix of many tissues; 
upon injury it is processed into an active form by serum proteases’*”’. 
Active HGF can regulate mTORCI via PI3K- Akt signalling’. Further- 
more HGF is known to influence SC behaviour’*'®. To test if HGF 
signalling has a role in the alert response, we used conditional ablation 
of the HGF receptor, cMet, to suppress HGF signalling in SCs’’. Ablation 
of cMet in SCs completely blocked the activation of mTORC1 signal- 
ling, as measured by pS6 staining, in cultured SCs and in vivo in CSCs 
following injury (Extended Data Figs 4b and 5f, g). Consistent with our 
hypothesis that mTORC1 activation is required for the alert response 
in SCs, cMet KO CSCs did not exhibit any functional response to injury 
(Fig. 3g-i and Extended Data Fig. 5h). Collectively, these data suggest 
that signalling downstream of cMet is critical for the induction of the 
alert response in SCs. 

Following tissue repair after injury, activity of the HGF activation 
cascade gradually subsides’. We found the frequency of pS6* CSCs 
following a distant injury declined to a level similar to that of non-injured 
animals 28 days post-injury (DPI) (Extended Data Fig. 6a). We also found 
that at 28 DPI the propensity to cycle and cell cycle entry kinetics of CSCs 
returned to those of QSCs (Extended Data Fig. 6b, c). Furthermore, the 
transcriptional profile of CSCs 28 DPI had returned to that of QSCs 
(Extended Data Fig. 6d). These data indicate that the alert state is revers- 
ible and that the functional and transcriptional changes in alert CSCs 
that occur downstream of mTORCI1 revert to the properties of QSCs 
when mTORC] activity subsides. 

To gain further understanding of the molecular pathways underlying 
the functional transition into the alert state, we analysed the transcrip- 
tional profiles from the SC-specific genetic models described above. 
We found that induction of genes involved in mitochondrial metabo- 
lism strongly correlated with the ability to transition into the alert state. 
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Wild-type CSCs and TSC1 KO QSCs show induction and Rptor KO 
and cMet KO CSCs do not (Extended Data Figs 3a and 7a-e). These data 
suggest that regulation of mitochondrial metabolism is a crucial aspect 
of stem cell quiescence. 

The function of SCs in response to injury is to proliferate, differen- 
tiate and form new muscle tissue!*!’. As such, we tested whether the 
functional changes of CSCs affected their differentiation and muscle 
regenerative abilities. Following isolation and ex vivo culturing, CSCs 
displayed enhanced kinetics of differentiation as measured by express- 
ion of myogenin (MyoG) and cell fusion (Fig. 4a, b and Extended Data 
Fig. 8a). To translate these observations in vivo, we assessed the ability 
of CSCs to participate in muscle regeneration. Three days before injury 
of the left tibialis anterior (TA) muscle, we performed an ‘alerting’ injury 
to the right limb to transition SCs in the left TA into the alert state (Fig. 4c). 
We found that animals that received an ‘alerting’ injury displayed strik- 
ingly enhanced muscle regeneration at all time points following injury 
when compared to the normal muscle regenerative process (Fig. 4d, e). 
These data show that the functional properties of alert SCs translate 
into enhanced muscle regenerative ability in response to injury. 

The markedly enhanced muscle regenerative function of CSCs prom- 
pted us to investigate other conditions which may induce the alert state 
in SCs. We found that SCs adopted functional aspects of the alert response 
to bone injuries and to minor skin wounds (Extended Data Fig. 8b, c), 
injuries for which the role of SCs is not apparent. These data suggest 
that SCs can adopt the alert state in response to multiple types of injuries 
and may be a general response of SCs to injury. Therefore, we tested if 
other populations of quiescent stem cells could similarly adopt proper- 
ties of the alert state. We found that fibro-adipogenic progenitors (FAPs), 
a resident mesenchymal stem cell population in skeletal muscle”®”’, 
responded in a similar way as SCs. CFAPs (FAPs in muscles of a limb con- 
tralateral to the site of muscle injury) displayed an induction of mMTORC1 
signalling, accelerated cell cycle entry, increased propensity to cycle and 
increased cell size when compared to quiescent FAPs from non-injured 
animals (QFAPs) (Fig. 4f-h and Extended Data Fig. 9a—c). Additionally, 
we found that long-term haematopoietic stem cells (LT-HSCs) displayed 
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activation of mTORCI signalling in response to muscle injury (Fig. 4i 
and Extended Data Fig. 9d). To test ifmTORCI1 activation in LT-HSCs 
caused increased functional potential, as it does in SCs, we then admin- 
istered interferon-gamma (IFN-7), to the animals to stimulate LT-HSC 
activation”. Similar to the effect of an ‘alerting’ injury on muscle regen- 
eration, LT-HSCs primed by muscle injury were more sensitive to IFN-y 
and had a more robust response (Fig. 4j). Notably, and similar to what 
we demonstrated in SCs, the induction of mTORC1 in HSCs increases 
their mitochondrial activity”, which is consistent with a transition 
into the alert state. Collectively, these data indicate that activation of 
mTORCI signalling in quiescent stem cells alters their properties, endow- 
ing them with enhanced functional potential, an alerting mechanism 
that prepares the cell for potential activation. 

As it relates to stem cell biology, the data we present here demon- 
strate that stem cells undergo dynamic transitions between functional 
phases in the quiescent state. We propose a model in which Gajert and 
Gp are phases within quiescence and form a quiescence cycle (Fig. 4k). 
Although it has been suggested that not all quiescent cells are function- 
ally equivalent’*”®, the in vivo relevance and the molecular mechanisms 
regulating functionally distinct states had not previously been elucidated. 
We propose that mTORC1 activity is a distinguishing aspect of at least 
two distinct phases within quiescence. Here we demonstrate how these 
phases of stem cell quiescence in vivo are regulated in the context of 
physiological conditions by mTORC1 (and, for SCs, by cMet). Most 
importantly, our data indicate that the ability to transition between Go 
and Galert is critical to the positioning of stem cell populations to be 
able to respond rapidly in tissue homeostasis and repair while main- 
taining a pool of deeply quiescent, reserve stem cells. This represents a 
newly identified form of cellular memory, an adaptive response akin to 
that in neuronal or immune cells, in which prior experience influences 
future responses. 


METHODS SUMMARY 


Unless stated otherwise, in the figure legend, all graphical data are presented as 
mean + s.e.m., except for histograms, and significance was calculated using two- 
tailed unpaired Student’s t-tests: *P < 0.05, **P < 0.01. When sample size (n values) 
are reported as a range, exact sample size values can be found in the Methods section. 
Time to first division experiments are presented as a representative histogram plot- 
ting data from individual cells and, on the right, as a bar graph depicting the quan- 
titative analysis of the mean time to first division in replicate experiments. 
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METHODS 

Mice. Tscr", Raptor" i cMet’ e Rosa26°*"” and NSG mice were obtained from the 
Jackson Laboratory. Pax7“"*"® mice were provided by Dr. Charles Keller (OHSU). 
All experiments were performed with 12-16-week-old male C57BL/6 mice, except 
in experiments with conditional cMet KO, and associated controls, which were ina 
mixed background of C57BL/6 and FVB. Animals were genotyped by PCR of tail 
DNA, except in Extended Data Fig. 5c which was performed using isolated SCs 
and FAPs, primer sequences available upon request. The genotypes of experimental 
KO andassociated control animals are as follows: TSC1 KO (TSCY Ul Day ZoTeER/+ 
Rosa26"*??’*) and wild type (TSC1 +7 Pax7OER* Rosa2erFh/*), Rptor KO (Rptor™" 
Pax7"™®* Rosa26?'*) and wild type (Rptor*’ + Pax77ER* Rosa2e +), cMet 
KO (cMet".Pax7E®* Rosa26eY’*) and WT (cMet*/*;Pax7?®* ;Rosa2eeP’*), 
Tamoxifen (TMX) (Sigma) was prepared in a mixture of corn oil and 7% ethanol 
and administered in 5 doses of 50 mg every 2-3 days by intraperitoneal (i.p.) 
injection. TMX injections were initiated on 6-8-week-old mice, and experimental 
mice were used 2-4 weeks after TMX administration. In pulse labelling experi- 
ments, 10mg of BrdU were injected ip. 12h before mice were euthanized. In 
continuous labelling experiments, BrdU (0.8 mg ml” ') was administered in the 
drinking water with 1% sucrose for the indicated period. Mice were housed and 
maintained in the Veterinary Medical Unit at Veterans Affairs Palo Alto Health 
Care Systems. Animal protocols were approved by the Administrative Panel on 
Laboratory Animal Care of VA Palo Alto Health Care System. 

Injury models. BaCl, was used as the agent for muscle injury unless stated other- 
wise. Briefly, mice were anaesthetized with isoflurane, and the lower hindlimb was 
shaved and skin was sterilized. A total of 70 pil of 1.2% BaCl, (w/v HO) was injected 
into and along the length of the tibialis anterior (TA) and gastrocnemius (Gas) 
muscles. Mice were given analgesic and antibiotics and allowed to recover. Muscle 
crush injuries were performed by opening the skin and fascia over the TA muscle, 
injuring the muscle with a hemostat along the length of the TA, and closing the skin 
with sutures. Skin injuries were performed by using scissors to make a 2 cm incision 
on the abdomen without injuring the peritoneum and closing the incision with 
wound clips. In all experiments, control, non-injured animals were subject to a 
mock injury: animals were anaesthized and administered analgesic and antibiotics. 
Muscle regeneration. Alert regeneration experiments were performed as depicted 
in Fig. 4c, by injuring the TA and Gas muscle in the right hindlimb on day —3 
followed by injuring the TA muscle in the left hindlimb on day 0. Control, normal 
regeneration animals were anaesthetized and given antibiotics and analgesics on 
day —3. On day 0, normal regeneration animals were subject to identical injuries 
as alert regeneration animals. At indicated DPI, animals were euthanized, the left 
TAs extracted, and prepared for histological analysis by freezing in liquid nitrogen 
cooled isopentane. TA muscles extracted 3.5 and 5 DPI were sectioned and subject 
to IF-IHC staining with eMHC and laminin antibodies to identify nascent muscle 
fibres. TA muscles extracted 11, 15, and 24 DPI were sectioned and subject to hae- 
matoxylin and eosin (H&E) staining to identify centrally nucleated fibres. The cross- 
sectional area (CSA) was of nascent fibres in posterior portion of the TA muscle 
approximately halfway along the proximal-distal axis was performed using AxioVision 
software (Zeiss). 

Cell isolation and FACs purification. Satellite cell isolation was performed as 
previously described’. Briefly, following death, hindlimb skeletal muscles were 
removed, minced and digested in collagenase and dispase. Satellite cells were pur- 
ified by gating mononuclear EYFP* cells using a BD FACSAria II or III. As depicted 
in Fig. 1a, QSCs were isolated from TA and Gas muscle of non-injured animals, 
ASCs were isolated from injured TA and Gast muscles, and CSCs isolated from TA 
and Gast contralateral to the injury. 

FAPs were purified from muscles isolated and digested as in the SC isolation 
protocol. Following digestion, FAPs were stained and purified as a population of 
CD31” /CD45~ and Sca-1* mononuclear cells, using the following antibodies: CD31- 
APC (clone MEC 13.3; BD Bioscience), CD45-APC (clone 30-F11; BD Bioscience), 
Scal PacBlue (clone D7; BioLegend). 

All FACS comparisons (size, MitoTracker, EYFP, EU, pS6) are from isolations 

performed on the same day and analysed on the same FACS instrument (BD FACSAria 
II or III). All FACS plots are representative experiments with similar results in at 
least 3 independent experiments. 
Analysis of HSCs. For mTOR analysis, lin” (lineage markers: B220, CD4, CD8, 
CD11b, Ly6G, Ter119) cKit*Scal *CD150° cells were obtained from bone marrow 
24h after injury or mock injury and stained with phospho-mTOR or phospho-S6 
antibodies. 

For IFN-y experiments, animals were subjected to muscle injury or mock injury 
as described above. After 24h, HSC division was induced with IFN-y (100 ng per 
mouse) or PBS by intravenous injection, and 4 mg BrdU was injected ip. After an 
additional 24h, the mice were euthanized and BrdU incorporation was measured 
in HSCs (Lin Sca* cKit*CD150* CD48 EPCR") (BrdU FlowKit; BD Bioscience). 
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Satellite cell transplantation. Donor EYFP* SCs were FACS purified from Pax7"*"* ; 
Rosa26®**"’* mice, 2.5 DPI (CSCs) or from non-injured animals (QSCs). Donor 
SCs were counted, washed with PBS and resuspended in PBS with 0.1% BSA at a 
concentration of 5 X 10° cells in 20 pl. The TA muscle of the host, 12-week-old 
male NSG mice, was prepared by BaCl, injury two days before transplantation. 
The transplantation was performed by anaesthetizing the host mice, exposing the 
TA muscle by opening the skin and fascia, and slowly injecting 20 il of the donor 
cell solution into the TA using a 50 jl Hamilton syringe. Each host mouse received 
transplantation of QSCs into the right TA and CSCs into the left TA. After injection, 
the skin over the TA was closed by sutures, the mice were administered analgesic 
and antibiotics and were allowed to recover. Two weeks after transplantation, 
SC engraftment was measured by FACS of EYFP* SCs from the TA muscle of 
donor mice. 

Mitotracker staining. Prior to FACs analysis, 200nM MitoTracker Deep Red 
(Invitrogen) was added to muscle digests and incubated for 1 h at 37 °C, with gentle 
shaking. Digests were washed once and MitoTracker staining was visualized on a 
BD FACSAria II or IIT in the APC channel. 

Cell culture. SCs and FAPs were cultured on ECM (Sigma) coated poly-p-lysine 
8-well chamber slides (BD), 15,000 cells were plated per well and cultured overnight 
in Hams F10 medium (Cellgro) with 10% fetal bovine serum (Gibco), the next day 
the medium was switched to Hams F10 with 10% horse serum (Gibco), and cells 
were cultured in this medium until fixation by PFA. In EdU incorporation experi- 
ments, 0.05 mM EdU (Invitrogen) was added to the culture medium and replen- 
ished every 12h until cells were fixed. SCs were fixed 40 h after plating and FAPs 
were fixed 48h after plating. For in vivo BrdU pulse labelling experiments, cells 
were isolated, plated and fixed 2 h after isolation. 

Time-lapse microscopy. To perform time-lapse microscopy analysis, 10,000 SCs 
were plated onto ECM coated 8-well chamber slides and allowed to adhere for several 
hours to slides. After the cells adhered the medium was changed to Hams F10 with 
10% HS and the slides were transferred to an environmentally controlled Zeiss 
Axiophot 200M equipped Axiocam. Time-lapse data acquisition and visualization 
was made using AxioVision software and images were captured every 15 min. The 
time required to complete the first division after plating was recorded only for cells 
that stayed within the acquisition field. Representative data from time-lapse exper- 
iments are displayed as histograms, bar graphs of time to first division display quan- 
tification of replicate experiments (mean + s.e.m.). 

ATP measurement. SC ATP levels were measured using the ATP Bioluminescence 
Assay Kit CLS II (Roche) according to the manufacturer’s instructions. Briefly, 
20,000 SCs were counted with a haemocytometer immediately after isolation, pelleted, 
and boiled in 100 mM Tris, 4 mM EDTA, pH 7.4 for 2 min. After boiling, the debris 
was pelleted and supernatant was used for analysis. 

Immunostaining. Immunofluorescence immunohistochemistry (IF-IHC) was 
performed on muscle tissue that was mounted with tragacanth gum and snap fro- 
zen in isopentane cooled in liquid nitrogen immediately after dissection. The 8-1m 
sections were fixed in 4% PFA for 5 min and blocked in donkey serum before stain- 
ing. Pax7 and eMHC staining was performed with the M.O.M. kit (Vector) accord- 
ing to the manufacturer’s instructions. pS6 staining was performed following Pax7 
staining by incubating the sections in a solution of PBS, 0.3% Triton X-100, 10% 
donkey serum and rabbit anti-pS6 antibodies at a dilution of 1:100, overnight. Sec- 
ondary detection of pS6 was performed with donkey anti-rabbit Alexa 647 anti- 
bodies (Invitrogen) at a dilution of 1:500. 

Immunocytochemistry (ICC) staining was performed on PFA fixed cells that 
had been cultured on chamber slides. EdU incorporation was visualized by Click- 
iT (Invitrogen) according to the manufacturer’s instructions. For BrdU analysis, 
cells were fixed with 70% ethanol and treated with 2N HCl for 20 min before stain- 
ing with BrdU antibodies. Image capture, analysis and quantification were per- 
formed using Volocity software. 

All displayed immunostaining images are representative of at least 3 independ- 
ent experiments. 

Western blotting. Western blot analysis was performed on whole cell extracts of 
1 X 10° SCs that were counted, washed and lysed in sample buffer immediately 
after FACS purification. Lysates were subject to SDS-PAGE, transferred to PVDF 
membrane, and probed with indicated antibodies. Between antibody probing, the 
PVDF membrane was stripped using Restore Western Blotting Stripping Solution 
(Pierce). 

Bioluminescence. Photo-emission was measured using 10,000 (CD31 , CD45, 
Sca-1~, VCAM"*) SCs purified by FACS from non-injured or 2.5 DPI Pax7"*"* ; 
Rosa26'"°4"/* mice. SCs were allowed to adhere to ECM-coated 6-well plates for 
1h before addition of luminal and imaging with an IVIS imager. 

mtDNA quantification. Total DNA was isolated from 10,000 SCs immediately 
after FACS isolation using QlAamp DNA micro kit (Qiagen) according to the 
manufacturer’s instructions. mtDNA was quantified by qRT-PCR using primers 
amplifying the Cytochrome B region on mtDNA (forward primer: 5’-CATTTAT 
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TATCGCGGCCCTA-3’, reverse primer: 5’-TGTTGGGTTGTTTGATCCTG-3’) 
relative to the B-globin region on gDNA (forward primer: 5’-GAAGCGATTCTA 
GGGAGCAG-3’, reverse primer: 5'-GGAGCAGCGATTCTGAGTAGA-3’). 
Cell size measurements 

FACS. Throughout the dissection, digestion, and processing of muscle tissue (described 
above), preparations were maintained at 37 °C. Immediately upon completion of 
the isolation protocol, mononuclear EYFP” cells were analysed in the forward scat- 
ter channel (FSC) by FACS to assess cell size. Data are presented as representative 
histograms. 

Microscopy. Immediately after isolation, QSCs, 2.5 DPI CSCs, and 2.5 DPI ASCs 
were plated on 8-well chamber slides and allowed to adhere for 30 min at 37 °C, 
after which the media was aspirated, chambers removed and a coverslip applied. 
Bright field images of the slides were acquired using an Axioskop 2 with a x40 
objective lens. The analysis of cell diameter was performed using AxioVision soft- 
ware (Zeiss). 

Antibodies. Antibodies used in this study were: anti-Pax7 (#pax7) and anti-eMHC 
(#F 1.652) from DSHB; phospho-S235/6 S6 (#4858 and #4857) and phospho-S2448 
mTOR (#2971) from Cell Signaling; rabbit anti-GFP (#A11122) from Invitrogen; 
chicken anti-GFP (#ab15580), rat anti-laminin (#ab11576) and anti-Raptor (#ab40768) 
from AbCam; anti-PDGFRa (#AF1062) from R&D Systems; anti-Myogenin 
(#566358) and anti-MyoD (#554130) from BD; anti-BrdU (#OBD0030G) from 
Serotec; actin (#A3854) from Sigma. For HSCs: lineage antibodies: anti-B220 
(#515-0452-82), anti-CD4 (#515-0042-82), anti-CD8 (#515-0081-82), anti-Macl 
(#515-0112-82), anti-Grl (#15-5931-82), anti-Ter119 (#515-5921-82); anti-Ly6A 
(Scal) (#25-5981-81); anti-CD117 (cKit) (#747-1171-82), anti-CD150 (#12-1502- 
82), anti-CD48 (#17-0481-82), anti-EPCR (#17-2012-80) from eBiosciences. 
Transcriptional profiling and pre-processing. For each sample, total RNA was 
isolated by TRIzol (Invitrogen) extraction followed by RNeasy Plus Micro Kit (Qiagen) 
from ~400,000 SCs pooled from = 4 mice. Hybridization to GeneChip Mouse 
Gene 1.0 ST Affymetrix arrays was performed by the Stanford Protein and Nucleic 
Acid Facility. Raw data files are available at the NCBI GEO database (accession 
numbers GSE55490 and GSE47177). 

Intensities were pre-processed using the Expression Console (Affymetrix) for 
RMA, and probe-sets that cross-hybridized, mapped to multiple transcripts or showed 
poor signal (intensity of <6.5) in all arrays were excluded. Processed array data are 
available as Supplementary Data Set 1. Arrays were batch-corrected using ComBat”’, 
and technical replicates were averaged. Probe sets lacking gene symbol annotation 
were excluded, and for transcripts covered by multiple probe sets, the most inform- 
ative probe set with average intensity in the top two-thirds of sets for a transcript 
was selected”””®. 

Transcriptional profile analysis. All analyses were performed on mean-centred 
log>-transformed expression levels. 

For gene annotation analysis, the background gene set was all genes after pre- 
processing (15,343 genes). For each comparison between conditions, to ensure 
biological importance and reproducibility, the foreground gene set was selected as 
the smallest of the following: the gene set with 10% FDR by rank products ana- 
lysis***’, all genes with at least 1.5-fold directional change between conditions, or 
the top 1,100 genes ranked by directional fold change. Enrichment analysis and 
redundancy grouping of KEGG pathways” was performed using GeneTerm Linker™* 
with corrected P value <0.05, minimum genes per term 4 and minimum silhouette 0.5. 

K-medians clustering with the k-means+ + seeding algorithm and Manhattan 
distance was performed on KEGG pathway oxidative phosphorylation genes (ID 
mmu00190) with k = 3. The partition with the smallest sum of intracluster dis- 
tances was chosen. This partition was validated as being the most tightly clustered 
(best average sample shadow (0.38) and silhouette (0.61)), robust against different 
seeds (highest frequency as optimal partition (0.47)), and robust against noise (highest 
average frequency of reproduced cluster pairs during random projections to lower 
dimensions (0.80))*°. 

Statistics. Unless stated otherwise, significance was calculated using two-tailed 
unpaired Student’s t-tests. Differences were considered statistically significant at 
the P< 0.05 level. 


General methods. Unless stated otherwise, sample size (n values) are reported as 
biological replicates of mice and/or SC isolations from separate mice performed on 
different days. In figure legends where sample size is reported as a range, the exact 
sample size values are reported below: 

Fig. 1b: QSCs (n = 13), CSCs (1.5 DPI, n = 3; 2.5 DPI, n = 7; 3.5 DPI, n =5), 
ASCs (1.5 DPI, n = 3;2.5 DPI, n = 5; 3.5 DPI, n = 5). Fig. 1c: QSCs (n = 11), CSCs 
(1 DPI, n = 2; 1.5 DPI, n = 2; 2.5 DPI, n = 2; 3.5 DPI, n = 2), ASCs (1 DPI, n = 2; 
1.5 DPI, n = 2;2.5 DPI, n = 2; 3.5 DPI, n = 2). Fig. 2e: QSCs (n = 4), CSCs (n = 3), 
ASCs (n = 3). Fig. 2f: QSCs (n = 5), CSCs (n = 4), ASCs (n = 3). Fig. 2h: QSCs 
(n=7), CSCs (1 DPI, n = 3; 2.5 DPI, n= 3; 3.5DPI, n= 4). Fig. 3a: wild-type 
QSCs (n = 11), TSC1 KO QSCs (n = 6). Fig. 3d: wild-type QSCs (n = 11), WT 
CSCs (n = 7), Rptor KO QSCs (n = 7), Rptor KO CSCs (n = 6). Fig. 3g: wild-type 
QSCs (n = 11), WT CSCs (n = 7), CMet KO QSCs (n = 4), cMet KO CSCs (n = 5). 
Figure 4e: Normal regeneration (3.5 DPI, n = 6; 6 DPI, n = 5; 11 DPI, n = 3; 15 DPI, 
n= 3; 24DPI, n=4), Alert regeneration (3.5 DPI, n = 5; 6 DPI, n=5; 11 DPI, 
n= 3; 15 DPI, n = 3; 24 DPI, n = 4). Figure 4h: QFAPs (” = 6), CFAPs (each point, 
n= 2), AFAPs (each point, n = 2). Figure 4i: non-injured (n = 4), injured (n = 5). 
Figure 4j: non-injured + PBS (1 = 4), non-injured plus IFN-y (n = 3), injured plus 
PBS (n = 3), injured plus IFN-y (n = 4). 

Extended Data Fig. 1c: non-injured (n = 2), injured (R-TA, n = 3; L-TA, n = 3; 
R-Quad, n = 2, triceps, n = 2). Extended Data Fig. 3i: (n = 4), Extended Data Fig. 3): 
(n = 6), Extended Data Fig. 3k: (n = 3), Extended Data Fig. 31: (n = 5), Extended 
Data Fig. 3m: (n = 6). Extended Data Fig. 5g: wild-type QSCs (n = 6), wild-type 
CSCs (n = 4), Met KO QSCs (n = 3), CMet KO CSCs (n = 6). Extended Data Fig. 6a: 
non-injured (n = 6), 2.5 DPI (n = 3), 7 DPI (n = 5), 14 DPI (n = 3), 28 DPI (n = 3). 
Extended Data Fig. 6a: QSCs (n = 13), CSCs (1.5 DPI, n = 3; 2.5 DPI, n = 7; 3.5 DPI, 
n=5;7 DPI, n = 2; 14DPI, n = 3; 21 DPI, n = 2; 35 DPI, n = 2), ASCs (1.5 DPI, 
n = 3; 2.5 DPI, n = 5; 3.5 DPI, n = 5; 7 DPI, n = 2; 14 DPI, n = 3; 21 DPI, n = 2; 
35 DPI, n = 2). Extended Data Fig. 6c: QSCs (n = 11), CSCs (2.5 DPI, n = 2;7 DPI, 
n = 2; 14 DPI, n = 2; 21 DPI, n = 2; 28 DPI, n = 2; 35 DPI, n = 6), ASCs (2.5 DPI, 
n= 2; 7 DPI, n = 2; 14DPI, n = 2; 21 DPI, n = 2; 28 DPI, n = 2; 35 DPI, n = 4). 
Extended Data Fig. 9c: QFAPs (n = 6), CSCs (each point, n = 2), ASCs (each point, 
n = 2). Extended Data Fig. 9d: non-injured (n = 4), injured (n = 5). 

In most cases, the data presented were compiled over the course of 2 years, as 
mice with the appropriate genotype became available. Therefore the magnitude of 
the effect and variability in the measurements were primary factors in determining 
sample size and replication of data. Although samples were not explicitly rando- 
mized or blinded, mouse identification numbers were used as sample identifiers 
and thus the genotypes and experimental conditions of each mouse/sample were 
not readily known or available to the experimenters during sample processing and 
data collection. The only criteria used to exclude samples involved the health of the 
animals, such as visible wounds from fighting. In these cases, the animals were 
handled in accordance with approved IAUCUC guidelines. 
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Extended Data Figure 1 | SCs distant from the site of an injury display a 
functional response to the injury. a, Representative FACS plot from isolation 
of EYFP* SCs from 10-week-old Pax7-=*’*; Rosa26#*"’/* mice 3 weeks 
following TMX treatment. Mononuclear cells from muscle digests were gated 
in FITC and Pac-Blue (autofluorescence) channels to separate EYFP™ SCs. 
EYFP* SCs were usually 2-4% of all events from muscle digestions. b, Progeny 
of CSCs and QSCs take comparable times to complete the second cell 
division. Analysis of the time required to complete the second division 
(QSCs 10.2 + 2h, n = 148 cells; CSCs 10.9 + 2h, n = 155), following the first 
cell cycle (Fig. 1d), shows that accelerated cell cycle kinetics of CSCs is limited to 
the first division. c, SCs throughout the body increase in propensity to cycle 
in response to injury. In injured animals, SCs isolated from indicated muscle 
groups show higher frequency of BrdU incorporation when compared to SCs 
from the same muscle groups from non-injured mice (n = 2 animals). 

d, Muscle crush injuries increase the in vivo cycling propensity of CSCs. Twelve 
hours after BrdU pulse labelling, SCs isolated from TA and Gast muscles 
contralateral to muscle crush injury show elevated BrdU labelling frequency 
versus SCs from those muscles from non-injured mice (mean = s.e.m.; 
non-injured, n = 5 animals; muscle crush, n = 3; **P<0.01). e, SCs 
contralateral to a muscle crush injury have increased cell cycle entry kinetics. 
2.5 DPI SCs contralateral to a muscle crush injury incorporate EdU more 
rapidly than QSCs when cultured ex vivo for 40h (mean + s.e.m., n = 3 
animals, *P < 0.05). 
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Extended Data Figure 2 | CSCs are distinct from QSCs but retain stem cell 
characteristics. a, CSCs are slightly larger than QSCs and much smaller than 
ASCs. Immediately after isolation, analysis of cell diameters of QSCs, 2.5 

DPI CSCs and 2.5 DPI ASCs, measured by phase contrast microscopy, shows 
that CSCs have a distribution that is shifted to the right compared to QSCs 
(histographic representation of data displayed in Fig. 2a, b). b, CSCs are larger 
than QSCs as measured by the FSC parameter by FACS (representative FACs 
plot, similar results observed in 4 independent experiments). c, CSCs have 
elevated intensity of an EYFP reporter. FACS analysis of EYFP intensity in the 
FITC channel shows that 2.5 DPI CSCs display a slight shift in EYFP 
distribution relative to QSCs, suggesting increased expression of this reporter 
from the Rosa26 locus (representative FACS plot, similar results observed in 4 
independent experiments). d, CSCs show elevated levels of pyronin Y staining, 
suggesting an increased RNA content relative to QSCs, but substantially less 
than ASCs (representative FACS plot, similar results observed in 4 independent 
experiments). e, CSCs increase global transcriptional activity compared with 
QSCs. FACS analysis of EU incorporation, following pulse labelling by i.p. 
injection, shows that 2.5 DPI CSCs have higher levels of EU nucleotide 
incorporation than QSCs, whereas ASCs show markedly elevated 
incorporation. f-i, Immunocytochemical (ICC) staining of QSCs, 2.5 DPI 
CSCs and 2.5 DPI ASCs immediately after isolation shows that CSCs are highly 
similar to QSCs in expression of the QSC marker Pax7 (f), as well as markers of 
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SC activation, MyoD (g) and Kié67 (h), and myogenic differentiation, MyoG 
(i) (mean = s.e.m.; 1 = 4 animals; *P < 0.05, **P < 0.01). j, CSCs have 
comparable ability to engraft as QSCs. EYFP* QSCs and 2.5 DPI CSCs were 
isolated from donor mice (Pax7""’*; Rosa26""*"’*). A total of 5 X 104 
EYFP* QSCs were transplanted into the left TAs and 5 X 10* EYFP* CSCs 
were transplanted into the right TAs of host NSG mice. Two weeks after 
transplantation, EYEP* SCs were isolated from TA muscles of host mice and 
SC engraftment efficiency was measured as the number of EYFP* SCs that 
were recovered as a percentage of the number of donor SCs that were 
transplanted (n = 4, red line indicates mean). For both donor cell populations, 
greater than 95% of SCs recovered were found to be Pax7* as measured by 
ICC (data not shown). k, CSCs that incorporate BrdU self-renew. Following 
injury to one TA muscle, mice were administered BrdU continuously for 4 days 
followed by 21 days of chase (as shown in the diagram). IF-[HC analysis of 
the TA contralateral to the injury revealed BrdU* Pax7” cells in the satellite cell 
position beneath the basal lamina. An example of such a cell is illustrated 
here (top row of images). On the right is quantification of BrdU* SCs after 
21 days of chase by ICC after FACS isolation, showing that CSCs have 
self-renewal capacity similar to QSCs (mean + s.e.m., n = 3 animals, 

*P < 0.05). Below is an example of a BrdU* myonucleus in the contralateral 
TA after 21 days of chase, suggesting that CSCs can also fuse with the adjacent 
fibre following proliferation. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 
b 
Comparison Annotation Annotation Terms Corrected Fold Cell a 
P Group (KEGG Pathway ID) P-Value Enrichment 1000 volume cS) 
x< 
Cell cycle (4110), Oocyte meiosis (4114), ‘2 
Gell cycle Progesterone-mediated oocyte maturation (4914) SAQE-Od 3.0 = 
CSCs & 
bese Oxidative phosphorylation (190), Huntington's 3 
QSCs [Mitochondrial] disease (5016), Parkinson's disease (5012), saiees ey S 
metabolism Alzheimer's disease (5010), Cardiac muscle : : 
contraction (4260) HM ascs fi cscs Hl ascs 
d Luciferase signal: Pax7°5"; Rosa26tSe4° 
QSCs ASCs 


CSCs 


f 
g 
pS6, flow 
QSCs 
aa CSCs 
s ASCs 
fo} 
: \ 


APC (pS6-Alexa-647)—> 


Each lane: 1x105 Cells 


Extended Data Figure 3 | CSCs have elevated mitochondrial and mTORC1 
activity. a, Induction of genes involved in the cell cycle and mitochondrial 
metabolism in CSCs. Pathway analysis of genes that were induced in CSCs 
versus QSCs showed enrichment of genes involved in the cell cycle and 
mitochondrial metabolism. Redundant KEGG pathways that contain 
overlapping genes were assembled into annotation groups (details of array and 
enrichment analysis are found in the Methods section). b, CSCs have slightly 
increased cell volume compared to QSCs. Cell volume was calculated from 
cell size measurements (Fig. 2b) (mean + s.e.m., n = 4 animals, *P < 0.05, 
**P <0.01 compared to QSCs). c, CSCs have a slightly greater intracellular 
ATP concentration than QSCs (mean + s.e.m., n = 4 animals, *P < 0.05 
compared to QSCs). d, Increase in photo emission from CSCs expressing 
luciferase reporter (LUSEAP). Immediately after isolation and plating, 
bioluminescence imaging of 1 X 10* Pax7"*"’*; Rosa26™"°4"”* SCs shows 
that 2.5 DPI CSCs have greater luminescence than QSCs, ASCs have 
substantially elevated luminescence. Activated fibro-adipogenic progenitors 
(AFAPs) were isolated from the same injured muscle as ASCs and plated as a 
negative control for LuSEAP expression. Light emission from luciferase is 
dependent on the amounts of luciferase enzyme, ATP and luciferin. Increased 
ATP and increased expression from the Rosa26 locus in CSCs (Fig. 2) and 
Extended Data Fig. 2c) could both contribute to increased luminescence. Data 
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presented are from a representative experiment with similar results observed in 
two independent experiments. e, Low magnification image of IF-IHC staining 
of TA muscle. Boxed areas are of the representative pS6~ and pS6* SCs that are 
shown in Fig. 2g. f, CSCs have increased levels of pS6 as shown by western 
blot analysis of whole-cell extracts from 1 X 10° cells of each population 
collected immediately after isolation. g, CSCs show a bimodal distribution of 
pS6 staining at 1 DPI, with peaks corresponding to the signal in pS6 QSCs and 
pS6* ASCs when analysed by FACS (representative FACS plot, similar results 
observed in 3 independent experiments). h, Sorting SCs for properties of the 
alert state (that is, high levels of MitoTracker Deep Red (MTDR) staining and 
YFP expression) enriches for SCs that display the other properties of alert SCs: 
elevated mTORCI activity, reduced time to first division and increased 
propensity to cycle. Representative gating of MTDR™;EYFP"™ SCs (Hi) and 
MTDR"™;EYEP”” SCs (Lo). i-m, Sorting of Hi SCs reveals a sub-population of 
QSCs that displays characteristics of the alert state. Hi SC cells have increased 
mTORC1 activity (i), an increased propensity to cycle in vivo as measured 
by incorporation of EdU nucleotide 12h after pulse labelling (j), and an 
accelerated time to first division (k). Both Hi and Lo SCs stain positive for 
the SC marker, Pax7 (1). 12h after an in vivo EdU pulse, most SCs that 
incorporate nucleotide (quantified in j) stain positive for pS6 (m). Panels 
i-m are displayed as mean = s.e.m., n = 3 animals, *P < 0.05, **P < 0.01. 
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Extended Data Figure 4 | TSC1 KO SCs show induction of pS6 and 
increased cell size. a, TSC1 KO increases SC pS6 levels. IF-IHC staining shows 
no pS6 staining of SCs in wild-type TA muscle and strong staining of SCs in 
TSC1 KO TA (representative images of low-magnification muscle section, 
numbered boxed regions are shown in high magnification below). b, Levels of 
pS6 in SC-specific KO models. TSC1 KO SCs show induction of pS6 when 
compared to wild-type QSCs, whereas Rptor KO QSCs and CSCs show no 
detectable pS6. cMet KO QSCs show comparable levels of pS6 as wild-type 
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QSCs. However, unlike wild-type CSCs, cMet KO CSCs show no induction 
of pS6. Displayed is western blotting analysis of whole cell extracts from 1 X 10° 
cells per each population/genetic model collected immediately after isolation. 
The first three lanes (WT: QSCs, CSCS and ASCs) are the same as Extended 
Data Fig. 3fand are redisplayed for the purpose of comparison. c, TSC1 KO SCs 
are larger than wild-type SCs (representative FACS plot, similar results 
observed in 4 independent experiments). 
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Extended Data Figure 5 | Rptor and cMet KO SCs contralateral to injury 
display no ‘alerting’ response. a, Depletion of Rptor protein in Rptor KO SCs. 
ICC staining of EYFP* SCs cultured for 40h after isolation shows that Rptor 
protein is undetectable in Rptor KO SCs but clearly detectable in wild-type SCs. 
b, Absence of pS6 in Rptor KO SCs. ICC staining shows that after 40 h in 
culture, EYFP™ wild-type SCs stain strongly pS6" whereas EYFP* Rptor 

KO SCs do not exhibit any detectable pS6 signal. c, PCR verification of Rptor 
exon 6 excision in Rptor KO SCs. Using primers flanking the floxed exon 6 
of the Rptor genomic locus, PCR analysis of genomic DNA from SCs isolated 
from a Rptor conditional KO animal (Rptor™ Vl Dax ZTeER!* Rosa26h*FP! *) shows 
efficient recombination of the floxed allele, whereas analysis of genomic 
DNA from SCs froma wild-type animal (Rptor*!* ;Pax7—E’* sRosa26r °F?!) 
and FAPs from a Rptor conditional KO animal does not show recombination. 
d, FACS analysis reveals that Rptor KO SCs are slightly smaller and display 
a slight leftward shift in FSC distribution relative to wild-type SCs. e, Rptor KO 


—— cMetKO CSCs 


SCs do not enlarge in response to contralateral injury. 2.5 DPI, Rptor KO CSCs 
show a nearly identical FSC distribution to that of Rptor KO QSCs and do not 
increase in size in response to contralateral injury as do wild-type CSCs (d). 
a-e, Representative data, similar results observed in at least 3 independent 
experiments. f, cMet is required for phosphorylation of S6 by HGF. In culture, 
wild-type SCs show a robust increase in the frequency of pS6* SCs in 
response to a 1 h stimulation with HGF whereas cMet KO SCs show no change 
in pS6 staining frequency (mean ~ s.e.m., n = 4, **P < 0.01). g, cCMet KO 
prevents induction of pS6 in SCs contralateral to injury as measured by IF-IHC 
(mean + s.e.m.; nm = 3 animals, = 50 Pax7* SCs analysed from each animal; 
*P < 0.05). h, cMet KO CSCs do not change in size. FACS analysis shows 
that cMet KO and wild-type QSCs have similar FSC distributions and that 
this distribution is not altered in cMet KO SCs contralateral to an injury 

(a representative FACS plot is shown; similar results were observed in 3 
independent experiments). 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 
80 pS6, in situ 
= 607 tmz 
oS 
to 40 
op) 
&. 20 
0 2.5 714 28 
et 0 10 20 30 40 0 10 20 30 40 
ra DPI DPI DPI 
AS 
—e—ASCs —@—CSCs --¢-- ASCs 


Transcriptional profile analysis 


20 30 


10 


PC2 score (10% of variance) 


-10 


-20 


10 30 50 70 


PC1 score (85% of variance) 


Pearson’s r 


0.82 0.85 0.89 


Extended Data Figure 6 | The functional properties of alert CSCs revert 
back to the QSC state 28 DPI. a, Frequency of pS6* CSCs returns to 
non-injured levels 28 DPI. Quantification of the percentage of pS6* SCs by 
IF-IHC shows that immediately following injury, most CSCs (orange bars) 
become pS6~. The frequency of pS6* CSCs decreases to levels observed in 
non-injured animals (black bar) by 28 DPI (mean + s.e.m., n = 3 animals, > 50 
Pax7* SCs analysed from each animal, **P < 0.01 versus non-injured). b, The 
propensity of CSCs to cycle returns to the level of QSCs several weeks after 
injury. At various times after injury, mice were given an ip. injection of BrdU. 
SCs were isolated 12h later from the injured muscles (ASCs) or from the 
contralateral muscles (CSCs). The frequency of BrdU incorporation returned to 
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QSC levels (dashed line) by approximately 21 days after injury for both ASCs 
and CSCs (mean + s.e.m.; n = 3 animals). c, Cell cycle entry kinetics of CSCs 
returns to the level of QSCs several weeks after injury. At various times after 
injury, SCs or their progeny were isolated from the injured muscles (ASCs) or 
from the contralateral muscles (CSCs) and cultured in vitro for 40h in the 
presence of EdU. The frequency of EdU incorporation returned to QSC 
levels (dashed line) by several weeks after injury for both ASCs and CSCs 
(mean + s.e.m., n = 2 animals). d, CSCs isolated 28 DPI have a transcriptional 
profile very similar to QSCs as shown by PCA and Pearson’s r value. 
Transcriptome analysis was performed as in Fig. 2c, with the addition of data 
from CSCs 28 DPI. 
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Extended Data Figure 7 | The ability to adopt the alert state strongly 
correlates with expression of genes involved in mitochondrial metabolism. 
a, Pathway analysis (as performed in Extended Data Fig. 3a) of the genes 
induced in TSC1 KO QSCs compared to wild-type QSCs shows that genes 
involved in mitochondrial metabolism are significantly enriched. b, c, Pathway 
analyses of the genes induced in Rptor KO CSCs compared to Rptor KO QSCs 
(b) and cMet KO CSCs compared to cMet KO QSCs (c) show that genes 
involved in mitochondrial metabolism are not enriched. d, Expression of genes 
involved in oxidative phosphorylation (KEGG ID mmu00190) is coupled 
with the alert state. Heat map of the expression of genes in the oxidative 
phosphorylation pathway shows that models of the alert state (CSCs and TSC1 
KO QSCs) have elevated expression of these genes and that models of non-alert 
SCs (QSCs, Rptor KO SCs and cMet KO SCs) have low expression of these 
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genes. Hierarchical clustering (Euclidean distance, complete linkage) shows 
that models of the alert state (CSCs and TSC1 KO SCs) cluster together and that 
models of non-alert SCs (QSCs, Rptor KO SCs and cMet KO SCs) form another 
cluster. e, Centroid-based clustering using oxidative phosphorylation genes 
(KEGG ID mmu00190) shows that grouping SCs into three clusters reveals 
an ‘alert’ cluster (wild-type CSCs and TSC1 KO QSCs), a ‘non-alert’ cluster 
(QSCs, CSCs 28 DPI, Rptor KO QSCs and CSCs and cMet KO QSCs and 
CSCs), and an ‘activated’ cluster (ASCs). Ellipses of dispersion show standard 
deviation (radius) and mean (centre) for each cluster using the first two 
components from PCA. Combined, these data show that induction of genes 
involved in mitochondrial metabolism strongly and consistently correlates with 
ability to adopt the alert state: wild-type CSCs and TSC1 KO QSCsare alert, and 
Rptor KO and cMet KO CSCs are not alert. 
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Extended Data Figure 8 | SCs enter the alert state in response to many types 
of injuries. a, Cultures of CSCs differentiate more quickly than do cultures of 
QSCs (representative ICC staining of MyoG, data quantified in Fig. 4a, b). 

b, SCs enter the alert state in response to injuries to non-muscle tissue. SCs 
contralateral to a tibial fracture (bone inj) and SCs in an animal that received a 
skin wound on the abdomen (skin inj) increase in propensity to cycle in vivo 
(mean + s.e.m.; non-injured, n = 5 animals; bone injured, n = 2; skin 
injured, n = 6; **P < 0.01 versus non-injured). c, SCs increase cycle cell 
entry kinetics in response to non-muscle injuries. SCs contralateral to a tibial 
fracture injury and SCs from mice that received a skin injury have increased 
frequency of EdU incorporation when cultured for 40h ex vivo compared to 
SCs from non-injured animals (mean + s.e.m.; 1 = 3 animals; *P < 0.05 versus 
non-injured). 
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Extended Data Figure 9 | FAPs and LT-HSCs adopt an alert state in 
response to muscle injury. a, Increased frequency of pS6" FAPs contralateral 
to muscle injury. Representative IF-IHC staining of TA muscle from a 
non-injured animal (top) or contralateral to injury (bottom) shows that the 
frequency of pS6* (PDGFRo.*;CD317) FAPs is increased in contralateral 
muscle (data are quantified in Fig. 4g). Labelled boxes indicate regions for 
which higher magnification is displayed. b, CFAPs increase in size. FACS 
analysis shows that 2.5 DPI CFAPs increase in FSC distribution compared to 
QFAPs; AFAPs show a greater increase in size (a representative FACs plot is 


shown, similar results were observed in 3 independent experiments). c, CFAPs 
increase in propensity to cycle. Twelve hours following an i-p. injection of BrdU, 
CFAPs isolated at indicated DPIs have an elevated frequency of BrdU 
incorporation compared to QFAPs (0 DPI). d, Muscle injury increases the 
frequency of phospho-mTOR™ (pmTOR) LT-HSCs. FACS analysis of pmTOR 
in Lineage , Sca-1*, cKitt, CD150* HSCs isolated from bone marrow 1 DPI 
showed that LT-HSCs induce mTORCI signalling in response to muscle 
injury (mean + s.e.m.; n= 4; *P < 0.05). 
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The metabolite a-ketoglutarate extends lifespan by 
inhibiting ATP synthase and TOR 
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Metabolism and ageing are intimately linked. Compared with ad 
libitum feeding, dietary restriction consistently extends lifespan and 
delays age-related diseases in evolutionarily diverse organisms’. Sim- 
ilar conditions of nutrient limitation and genetic or pharmacological 
perturbations of nutrient or energy metabolism also have longevity 
benefits**. Recently, several metabolites have been identified that 
modulate ageing”*; however, the molecular mechanisms underlying 
this are largely undefined. Here we show that a-ketoglutarate (a-KG), 
a tricarboxylic acid cycle intermediate, extends the lifespan of adult 
Caenorhabditis elegans. ATP synthase subunit f is identified as a 
novel binding protein of a-KG using a small-molecule target iden- 
tification strategy termed drug affinity responsive target stability 
(DARTS)’. The ATP synthase, also known as complex V of the mito- 
chondrial electron transport chain, is the main cellular energy- 
generating machinery and is highly conserved throughout evolution*”. 
Although complete loss of mitochondrial function is detrimental, 
partial suppression of the electron transport chain has been shown 
to extend C. elegans lifespan’® '°. We show that a-KG inhibits ATP 
synthase and, similar to ATP synthase knockdown, inhibition by 
a-KG leads to reduced ATP content, decreased oxygen consump- 
tion, and increased autophagy in both C. elegans and mammalian 
cells. We provide evidence that the lifespan increase by a-KG requires 
ATP synthase subunit B and is dependent on target of rapamycin 
(TOR) downstream. Endogenous @-KG levels are increased on star- 
vation and a-KG does not extend the lifespan of dietary-restricted 
animals, indicating that a-KG is a key metabolite that mediates lon- 
gevity by dietary restriction. Our analyses uncover new molecular 
links between a common metabolite, a universal cellular energy gen- 
erator and dietary restriction in the regulation of organismal lifespan, 
thus suggesting new strategies for the prevention and treatment of 
ageing and age-related diseases. 

To gain insight into the regulation of ageing by endogenous small mol- 
ecules, we screened normal metabolites and aberrant disease-associated 
metabolites for their effects on adult lifespan using the C. elegans model. 
We discovered that the tricarboxylic acid (TCA) cycle intermediate 
a-KG (but not isocitrate or citrate) delays ageing and extends the life- 
span of C. elegans by ~50% (Fig. 1a and Extended Data Fig. 1a). In the 
cell, x-KG (or 2-oxoglutarate; Fig. 1b) is produced from isocitrate by oxi- 
dative decarboxylation catalysed by isocitrate dehydrogenase (IDH). 
a-KG can also be produced anaplerotically from glutamate by oxidative 


deamination using glutamate dehydrogenase, and as a product of pyri- 
doxal phosphate-dependent transamination reactions in which glutamate 
is a common amino donor. «-KG extended the lifespan of wild-type N2 
worms in a concentration-dependent manner, with 8 mM «-KG pro- 
ducing the maximal lifespan extension (Fig. 1c); 8 mM was the concen- 
tration used in all subsequent C. elegans experiments. There is a ~50% 
increase in o-KG concentration in worms on 8 mM «-KG plates com- 
pared with those on vehicle plates (Extended Data Fig. 1b), or ~160 UM 
versus ~110 1M assuming homogenous distribution (Methods). o-KG 
not only extends lifespan, but also delays age-related phenotypes, such 
as the decline in rapid, coordinated body movement (Supplementary 
Videos 1 and 2). «-KG supplementation in the adult stage is sufficient 
for longevity (Extended Data Fig. 1c). 

The dilution or killing of the C. elegans bacterial food source has been 
shown to extend worm lifespan", but the lifespan increase by «-KG is 
not due to altered bacterial proliferation or metabolism (Fig. 1d, e and 
Extended Data Fig. 1d). Animals also did not view «-KG-treated food as 
less favourable (Extended Data Fig. le, f), and there was no significant 
change in food intake, pharyngeal pumping, foraging behaviour, body 
size or brood size in the presence of a-KG (Extended Data Fig. le—h; 
data not shown). 

In the cell, «-KG is decarboxylated to succinyl-CoA and CO, by a-KG 
dehydrogenase (encoded by ogdh-1), a key control point in the TCA cycle. 
Increasing o-KG levels by ogdh-1 RNA interference (RNAi) (Extended 
Data Fig. 1b) also extends worm lifespan (Fig. 1f and Supplementary 
Notes), consistent with a direct effect of «-KG on longevity indepen- 
dent of bacterial food. 

To investigate the molecular mechanism(s) of longevity by «-KG, 
we took advantage of an unbiased biochemical approach, DARTS’. As 
we proposed that key target(s) of «-KG are likely to be conserved and 
ubiquitously expressed, we used a human cell line (Jurkat) that is easy 
to culture as the protein source for DARTS (Fig. 2a). Mass spectrometry 
identified ATP5B, the B subunit of the catalytic core of the ATP synthase, 
among the most abundant and enriched proteins present in the a-KG- 
treated sample (Extended Data Table 1); the homologous « subunit ATP5A 
was also enriched but to a lesser extent. The interaction between «-KG and 
ATP5B was verified using additional cell lines (Fig. 2b; data not shown), and 
corroborated for the C. elegans orthologue ATP-2 (Extended Data Fig. 2a). 

a-KG inhibits the activity of complex V, but not complex IV, from 
bovine heart mitochondria (Fig. 2c and Extended Data Fig. 2b; data not 
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Figure 1 | a-KG extends the adult lifespan of C. elegans. a, o.-KG extends the 
lifespan of adult worms in the metabolite longevity screen. All metabolites 
were given at a concentration of 8 mM. b, Structure of x-KG. c, Dose-response 
curve of the «-KG effect on longevity. d, e, a-KG extends the lifespan of worms 
fed bacteria that have been ampicillin arrested, mean lifespan (days of 
adulthood) with vehicle treatment (m,.),) = 19.4 (n = 80 animals tested), 


shown). This inhibition is also readily detected in live mammalian cells 
(Fig. 2d; data not shown) and in live nematodes (Fig. 2e), as evidenced 
by reduced ATP levels. Concomitantly, oxygen consumption rates are 
lowered (Fig. 2f, g), similar to with atp-2 knockdown (Extended Data 
Fig. 2c). Specific inhibition of complex V—but not the other electron 
transport chain (ETC) complexes—by «-KG is further confirmed by 
respiratory control analysis’* (Fig. 2h and Extended Data Fig. 2d-h). 
To understand the mechanism of inhibition by a-KG, we studied the 
enzyme inhibition kinetics of ATP synthase. «-KG (released from octyl 


My.KG = 25.1 (n = 91), P< 0.0001 (log-rank test) (d); or y-irradiation-killed, 
Myen = 19.0 (n = 88), My-xKG = 23.0 (n = 46), P< 0.0001 (log-rank test) (e). 
OP50, E. coli OP50 strain. f, «-KG does not further extend the lifespan of ogdh-1 
RNAi worms, Myen = 21.2 (n = 98), My-KG = 21.1 (n = 100), P = 0.65 (log- 
rank test). 


a-KG) decreases both the effective velocity of the enzyme-catalysed 
reaction at an infinite concentration of the substrate (Vmax) and the 
Michaelis constant (K,,) of ATP synthase, indicative of uncompetitive 
inhibition (Fig. 2i and Supplementary Notes). 

To determine the significance of ATP-2 to the longevity by «-KG, 
we measured the lifespan of atp-2 RNAi adults given «-KG. As reported 
previously’, atp-2 RNAi animals live longer than control RNAi ani- 
mals (Fig. 3a). However, their lifespan is not further extended by «-KG 
(Fig. 3a), indicating that ATP-2 is required for the longevity benefit of 
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Figure 2 | a-KG binds and inhibits ATP synthase. a, DARTS identifies 
ATP5B as an o&-KG-binding protein. Red arrowhead, protected band. 

b, DARTS confirms -KG binding specifically to ATP5B. IB, immunoblot. 
c, Inhibition of ATP synthase by a-KG (released from octyl a-KG; 
Supplementary Notes). This inhibition was reversible (data not shown). 

d, e, Reduced ATP levels in octyl «-KG-treated normal human fibroblasts 
(**P = 0.0016, ****P < 0.0001; by t-test, two-tailed, two-sample unequal 
variance) (d) and o-KG-treated worms (day 2, P = 0.969; day 8, *P = 0.012; 
by t-test, two-tailed, two-sample unequal variance) (e). RLU, relative 
luminescence units. f, g, Decreased oxygen consumption rate (OCR) in octyl 
a-KG-treated cells (***P = 0.0004, ****P < 0.0001; by t-test, two-tailed, 
two-sample unequal variance) (f) and «-KG-treated worms (P < 0.0001; 

by t-test, two-tailed, two-sample unequal variance) (g). h, «-KG, released from 
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octyl «-KG (800 uM), decreases state 3, but not state 40 or 3u (P = 0.997), 
respiration in mitochondria isolated from mouse liver. The respiratory control 
ratio is decreased in the octyl a-KG- (3.1 + 0.6) versus vehicle-treated 
mitochondria (5.2 + 1.0) (*P = 0.015; by t-test, two-tailed, two-sample 
unequal variance). Oligo, oligomycin; FCCP, carbonyl cyanide-4- 
(trifluoromethoxy)phenylhydrazone; AA, antimycin A. i, Eadie—Hofstee plot 
of steady-state inhibition kinetics of ATP synthase by a-KG (produced by 

in situ hydrolysis of octyl «-KG). [S] is the substrate (ADP) concentration, and 
V is the initial velocity of ATP synthesis in the presence of 200 1M octanol 
(vehicle control) or octyl a-KG. o%-KG (produced from octyl o%-KG) decreases 
the apparent Vinax (53.9 to 26.7) and Ky (25.9 to 15.4), by nonlinear regression 
least-squares fit. c-i, Results were replicated in two independent experiments. 
Mean + standard deviation (s.d.) is plotted in all cases. 
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Figure 3 | a-KG longevity is mediated through ATP synthase and the 
dietary restriction/TOR axis. a—g, Effect of x-KG on the lifespan of mutant or 
RNAi worms. a, atp-2 RNAi, Myen = 22.8 (n = 97), My-KG = 22.5 (n = 94), 
P= 0.35; or RNAi control, mye, = 18.6 (n = 94), my-Kg = 23.4 (n= 91), 
P<0.0001. b, daf-2(e1370), Myen = 38.0 (n = 72), My.xG = 47.6 (n = 69), 
P<0.0001. c, eat-2(ad1116), mye, = 22.8 (n = 59), My-KG = 22.9 (n = 40), 
P=0.79. d, let-363 RNAi, Myey = 25.1 (n = 96), My-KG = 25.7 (n = 74), 

P= 0.95; or gfp RNAi control, my. = 20.2 (n = 99), myxg = 27.7 (n= 81), 


a-KG. This requirement is specific because, in contrast, the lifespan of 
the even longer-lived insulin/IGF-1 receptor daf-2(e1370) mutant worms* 
is further increased by «-KG (Fig. 3b). Remarkably, oligomycin, an inhib- 
itor of ATP synthase, also extends the lifespan of adult worms (Extended 
Data Fig. 3a). Together, the direct binding of ATP-2 by «-KG, the related 
enzymatic inhibition, reduction in ATP levels and oxygen consump- 
tion, lifespan analysis, and other similarities (see also Supplementary 
Notes, Extended Data Fig. 4) to atp-2 knockdown or oligomycin treat- 
ment demonstrate that «-KG probably extends lifespan primarily by 
targeting ATP-2. 

The lower ATP content in o-KG-treated animals suggests that increased 
longevity by «-KG may involve a state similar to that induced by dietary 
restriction. Consistent with this idea, we found that «-KG does not extend 
the lifespan of eat-2(ad1116) animals (Fig. 3c), which is a model of dietary 
restriction with impaired pharyngeal pumping and therefore reduced 
food intake'®. The longevity of eat-2 mutants requires TOR (encoded 
by the C. elegans orthologue let-363)'’, an important mediator of the 
effects of dietary restriction on longevity’®. Likewise, «-KG fails to increase 
the lifespan of let-363 RNAi animals (Fig. 3d). The AMP-activated protein 
kinase (AMPK) is another conserved major sensor of cellular energy 
status’. Both AMPK (C. elegans orthologue aak-2) and the FoxO tran- 
scription factor DAF-16 mediate dietary-restriction-induced longevity 
in C. elegans fed diluted bacteria”, but neither is required for lifespan 
extension in the eat-2 model'*”’. We found that in aak-2 (Extended Data 
Fig. 5a) and daf-16 (Fig. 3e) mutants the longevity effect of a-KG is 
smaller than in N2 animals (P< 0.0001), suggesting that «-KG lon- 
gevity partially depends on AMPK and FoxO; nonetheless, lifespan is 
significantly increased by «-KG in aak-2 (24.3%, P < 0.0001) and daf- 
16 (29.5%, P < 0.0001) mutant or RNAi animals (Fig. 3e and Extended 
Data Fig. 5a, b; data not shown), indicating an AMPK- and FoxO- 
independent effect of «-KG in increasing longevity. 

The inability of «-KG to extend further the lifespan of let-363 RNAi 
animals suggests that o.-KG treatment and TOR inactivation extend life- 
span either through the same pathway (with o-KG acting on or upstream 
of TOR), or through independent mechanisms or parallel pathways that 
converge on a downstream effector. The first model predicts that the 


P<0.0001. e, daf-16(mu86), mye, = 13.4 (n= 71), my-xg = 17.4 (n= 72), 
P<0.0001; or N2, mye, = 13.2 (n = 100), my.xG = 22.3 (n = 104), P< 0.0001. 
f, pha-4(zu225), Myc, = 14.2 (n = 94), Myx = 13.5 (n= 109), P= 0.55. 

g, hif-1(ia4), Myen = 20.5 (n = 85), My-Kg = 26.0 (n = 71), P< 0.0001; or N2, 
Myeh = 21.5 (n = 101), my-xng = 24.6 (n = 102), P< 0.0001. P values were 
determined by the log-rank test. Number of independent experiments: 

RNAi control (6), atp-2 (2), let-363 (3), N2 (5), daf-2 (2), eat-2 (2), pha-4 (2), 
daf-16 (2), hif-1 (5). 


TOR pathway will be less active upon «-KG treatment, whereas if the 
latter model were true then TOR would be unaffected by «-KG treat- 
ment. In support of the first model, we found that TOR pathway activity 
is decreased in human cells treated with octyl «-KG (Fig. 4a and Extended 
Data Fig. 6a, b). However, o-KG does not interact with TOR directly 
(Extended Data Fig. 6d, e). Consistent with the involvement of TOR in 
a-KG longevity, the FoxA transcription factor PHA-4, which is required 
to extend adult lifespan in response to reduced TOR signalling” and for 
dietary-restriction-induced longevity in C. elegans”, is likewise required 
for «-KG-induced longevity (Fig. 3f). Moreover, autophagy, which is 
activated both by TOR inhibition’*” and by dietary restriction”, is mark- 
edly increased in worms treated with «-KG (or ogdh-1 RNAi) and in 
atp-2 RNAianimals (Fig. 4b, c, Extended Data Figs 6c, 7 and Supplemen- 
tary Notes), as indicated by the prevalence of green fluorescent protein 
GFP::LGG-1 puncta (Methods). Autophagy was also induced in mam- 
malian cells treated with octyl «-KG (Extended Data Fig. 6f). Further- 
more, &-KG does not result in significantly more autophagy in either 
atp-2 RNAi or let-363 RNAi worms (Fig. 4b, c). The data provide further 
evidence that «-KG decreases TOR pathway activity through the inhibi- 
tion of ATP synthase. Similarly, autophagy is induced by oligomycin, 
and oligomycin does not augment autophagy in let-363 RNAi worms 
(Extended Data Fig. 3b, c). 

a-KG is not only a metabolite, but also a co-substrate for a large family 
of dioxygenases”. The hypoxia inducible factor (HIF-1) is modified 
by one of these enzymes, the prolyl 4-hydroxylase (PHD) EGL-9, and 
thereafter degraded by the von Hippel-Lindau (VHL) protein*®”’. a-KG 
extends the lifespan of animals with loss-of-function mutations in hif-1, 
egl-9 and vhl-1 (Fig. 3g and Extended Data Fig. 5c), suggesting that this 
pathway does not play a major part in lifespan extension by 1-KG. How- 
ever, it is prudent to acknowledge that the formal possibility of other 
a-KG-binding targets having an additional role in the extension of life- 
span by «-KG cannot be eliminated at this time. 

We show that ageing in C. elegans is delayed by 1-KG supplementa- 
tion in adult animals. This longevity effect is probably mediated by ATP 
synthase, which we identified as a direct target of 1-KG, and TOR, a 
major effector of dietary restriction. Identification of new protein targets 
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Figure 4 | Inhibition of ATP synthase by a-KG causes a conserved decrease 
in TOR pathway activity. a, Decreased phosphorylation of mammalian 
TOR substrates in U87 cells treated with octyl «-KG or oligomycin. Similar 
results were obtained in HEK-293 cells, normal human fibroblasts and mouse 
embryonic fibroblasts (data not shown). P, phospho. b, Increased autophagy in 
animals treated with a-KG or RNAi for atp-2 or let-363. Photographs were 


of «-KG illustrates that regulatory networks acted upon by metabolites 
are probably more complex than appreciated at present, and that DARTS 
is a useful method for discovering new protein targets and regulatory 
functions of metabolites. Our findings demonstrate a novel mecha- 
nism for extending lifespan that is mediated by the regulation of cel- 
lular energy metabolism by a key metabolite. Such moderation of ATP 
synthesis by metabolite(s) has probably evolved to ensure energy effi- 
ciency by the organism in response to nutrient availability. We suggest 
that this system may be exploited to confer a dietary-restriction-like 
state that favours maintenance over growth, and thereby delays ageing 
and prevents age-related diseases. In fact, the TOR pathway is often 
hyperactivated in human cancer; inhibition of TOR function by a-KGin 
normal human cells suggests an exciting role for x-KG as an endogenous 
tumour suppressor metabolite. Interestingly, physiological increases in 
a-KG levels have been reported in starved yeast and bacteria”®, in the 
liver of starved pigeons”, and in humans after physical exercise*®. The 
biochemical basis for this increase of «-KG is explained by starvation- 
based anaplerotic gluconeogenesis, which activates glutamate-linked 
transaminases in the liver to provide carbon derived from amino acid 
catabolism. Consistent with this idea, «-KG levels are elevated in starved 
C. elegans (Fig. 4d). These findings suggest a model in which «-KG is a 
key metabolite mediating lifespan extension by starvation/dietary restric- 
tion (Fig. 4e). 

Longevity molecules that delay ageing and extend lifespan have long 
been a dream of humanity. Endogenous metabolites such as «-KG that 
can alter C. elegans lifespan suggest that an internal mechanism may 
exist that is accessible to intervention; whether this can translate into 
manipulating the ageing process in humans remains to be seen. 


METHODS SUMMARY 

Lifespan analysis. All lifespan assays were conducted at 20 °C on solid nematode 
growth media (NGM) and were replicated in at least two independent experiments. 
P values were determined by the log-rank (Mantel-Cox) test; survival curves were 
generated using GraphPad Prism. All lifespan data are available in Extended Data 
Table 2. 

DARTS. Human Jurkat cell lysates were incubated with «-KG and digested using 
Pronase. Proteins protected from proteolysis by «-KG binding were analysed by liquid 
chromatography-tandem mass spectrometry (LC-MS/MS) as described previously’, 
and identified by searching against the human Swissprot database (release 57.15) 
using Mascot with all peptides meeting a significance threshold of P< 0.05. 
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taken at X100 magnification. c, GFP::LGG-1 puncta quantified using 

ImageJ (Methods). Data show results of 2-3 independent experiments. 

Bars indicate the mean. ****P < 0.0001; NS, not significant (t-test, two-tailed, 
two-sample unequal variance). d, o-KG levels are increased in starved worms. 
**P <0).01 (t-test, two-tailed, two-sample unequal variance). Mean + s.d. is 
plotted. e, Model of «-KG-mediated longevity. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Nematode strains and maintenance. C. elegans strains were maintained using stan- 
dard methods*". The following strains were used (strain, genotype): Bristol N2, wild 
type; DA1116, eat-2(ad1116)II; CB1370, daf-2(e1370)ILI, CF1038, daf-16(mu86)I; 
PD8120, smg-1(cc546ts)I; SM190, smg-1(cc546ts)I;pha-4(zu225)V; RB754, aak- 
2(0k524)X; ZG31, hif-1(ia4)V; ZG5%, hif-1(ia7)V; JT307, egl-9(sa307)V; CB5602, 
vhl-1(0k161)X; DA2123, adIs2122[lgg-1::GFP + rol-6(su1006)}. They were all obtained 
from the Caenorhabditis Genetics Center (CGC). 

RNAiin C. elegans. RNAi in C. elegans was accomplished by feeding worms HT115 
(DE3) bacteria expressing target-gene double-stranded RNA (dsRNA) from the 
pL4440 vector’. dsRNA production was induced overnight on plates containing 
1 mM isopropyl-B-p-thiogalactoside (IPTG). All RNAi feeding clones were obtained 
from the C. elegans ORF-RNAi Library (Thermo Scientific/Open Biosystems) unless 
otherwise stated. The C. elegans TOR (let-363) RNAi clone® was obtained from 
Joseph Avruch (MGH/Harvard). Efficient knockdown was confirmed by western 
blotting of the corresponding protein or by RT-PCR of the mRNA. The primer 
sequences used for (RT-PCR are as follows. atp-2 forward: TGACAACATTTTC 
CGTTTCACCG; atp-2 reverse: AAATAGCCTGGACGGATGTGAT; let-363 for- 
ward: GATCCGAGACAAGATGAACGTG  let-363 reverse: ACAATTTGGAAC 
CCAACCAATC; ogdh-1 forward: TGATTTGGACCGAGAATTCCTT; ogdh-1 
reverse: GGATCAGACGTTTGAACAGCAC. 

We validated the RNAi knockdown of both ogdh-1 and atp-2 by quantitative 
RT-PCR and also of atp-2 by western blotting. Transcripts of ogdh-1 were reduced 
by 85%, and transcripts and protein levels of atp-2 were reduced by 52% and 83%, 
respectively, in larvae that were cultivated on bacteria that expressed the corre- 
sponding dsRNAs. In addition, RNAi of atp-2 in this study was associated with 
delayed post-embryonic development and larval arrest, consistent with the pheno- 
types of atp-2(ua2) animals. Analysis by (RT-PCR indicated a modest but signifi- 
cant decrease by 26% in transcripts of let-363 in larvae undergoing RNAi; moreover, 
molecular markers for autophagy were induced in these animals, and the lifespan 
of adults was extended, consistent with partial inactivation of the kinase. 

In lifespan experiments, we used RNAi to inactivate atp-2, ogdh-1 and let-363 in 
mature animals in the presence or absence of exogenous a-KG. The concentration 
of o-KG used in these experiments (8 mM) was empirically determined to be most 
beneficial for wild-type animals (Fig. 1c). This approach enabled us to evaluate the 
contribution of essential proteins and pathways to the longevity conferred by sup- 
plementary o-KG. Specifically, we were able to substantially but not fully inactivate 
atp-2 in adult animals that had completed embryonic and larval development. As 
described earlier, supplementation with 8 mM o-KG did not further extend (and 
in fact, on one occasion, even decreased) the lifespan of atp-2 RNAi animals (Extended 
Data Table 2), indicating that atp-2 is required for u-KG to promote longevity. On 
the other hand, a complete inactivation of atp-2 would be lethal, and thereby mask 
the benefit of ATP synthase inhibition by «-KG. 

Lifespan analysis. Lifespan assays were conducted at 20°C on solid nematode 
growth media (NGM) using standard protocols and were replicated in at least two 
independent experiments. C. elegans were synchronized by performing either a timed 
egg lay** or an egg preparation (lysing ~ 100 gravid worms in 70 jl M9 buffer”, 25 ul 
bleach (10% sodium hypochlorite solution) and 5 ul 10 N NaOH). Young adult 
animals were picked onto NGM assay plates containing 1.5% dimethyl sulfoxide 
(DMSO; Sigma, D8418), 49.5 |.M 5-fluoro-2’-deoxyuridine™ (FUDR; Sigma, F0503), 
and a-KG (Sigma, K1128) or vehicle control (HO). FUDR was included to pre- 
vent progeny production. Media containing o-KG were adjusted to pH 6.0 (that is, 
the same pH as the control plates) by the addition of NaOH. All compounds were 
mixed into the NGM media after autoclaving and before solidification of the media. 
Assay plates were seeded with OP50 (or a designated RNAi feeding clone, see later). 
Worms were moved to new assay plates every 4 days (to ensure sufficient food was 
present at all times and to reduce the risk of mould contamination). To assess the 
survival of the worms, the animals were prodded with a platinum wire every 2-3 days, 
and those that failed to respond were scored as dead. For analysis concerning mutant 
strains, the corresponding parent strain was used as a control in the same experiment. 

For lifespan experiments involving RNAi, the plates also contained 1 mM IPTG 
(Acros, CAS 367-93-1) and 50 Lg ml | ampicillin (Fisher, BP1760-25). RNAi was 
accomplished by feeding N2 worms HT115(DE3) bacteria expressing target-gene 
dsRNA from pL4440 (ref. 32); control RNAi was done in parallel for every experi- 
ment by feeding N2 worms HT115(DE3) bacteria expressing either GFP dsRNA 
or empty vector (which gave identical lifespan results). 

Lifespan experiments with oligomycin (Cell Signaling, 9996) were performed as 
described for «-KG (that is, NGM plates with 1.5% DMSO and 49.5 4M FUDR; N2 
worms; OP50 bacteria). 

For lifespan experiments concerning smg-1(cc546ts);pha-4(zu225) and smg- 
1(cc546ts)*”, from egg to L4 stage the strains were grown at 24 °C, which inactivates 
the smg-1 temperature-sensitive allele, preventing mRNA surveillance-mediated 
degradation of the pha-4(zu225) mRNA, which contains a premature stop codon, 


and thus produces a truncated but fully functional PHA-4 transcription factor”. 
Then at the L4 stage the temperature was shifted to 20 °C, which restores smg-1 
function and thereby results in the degradation of pha-4(zu225) mRNA. Treat- 
ment with o-KG began at the L4 stage. 

All lifespan data are available in Extended Data Table 2, including sample sizes. 
The sample size was chosen on the basis of standards done in the field in published 
manuscripts. No statistical method was used to predetermine the sample size. Ani- 
mals were assigned randomly to the experimental groups. Worms that ruptured, 
bagged (that is, exhibited internal progeny hatching), or crawled off the plates were 
censored. Lifespan data were analysed using GraphPad Prism; P values were cal- 
culated using the log-rank (Mantel-Cox) test. 

Statistical analyses. All experiments were repeated at least two times with identical 
or similar results. Data represent biological replicates. Appropriate statistical tests 
were used for every figure. Data meet the assumptions of the statistical tests described 
for each figure. Mean ~ s.d. is plotted in all figures unless stated otherwise. 

Food preference assay. Protocol adapted from Abada et al.**. A 10 cm NGM plate 
was seeded with two spots of OP50 as shown in Extended Data Fig. le. After letting 
the OP50 lawns dry over 2 days at room temperature, vehicle (H,O) or a-KG (8 mM) 
was added to the top of the lawn and allowed to dry over 2 days at room tempera- 
ture. Approximately 50-100 synchronized adult day 1 worms were placed onto the 
centre of the plate and their preference for either bacterial lawn was recorded after 
3h at room temperature. 

Target identification using DARTS. For unbiased target identification (Fig. 2a), 
human Jurkat cells were lysed using M-PER (Thermo Scientific, 78501) with the 
addition of protease inhibitors (Roche, 11836153001) and phosphatase inhibitors”. 
TNC buffer (50 mM Tris-HCl pH 8.0, 50 mM NaCl, 10 mM CaCl) was added to 
the lysate and protein concentration was then determined using the BCA Protein 
Assay kit (Pierce, 23227). Cell lysates were incubated with either vehicle (HO) or 
a-KG for 1h on ice followed by an additional 20 min at room temperature. Diges- 
tion was performed using Pronase (Roche, 10165921001) at room temperature for 
30 min and stopped using excess protease inhibitors with immediate transfer to ice. 
The resulting digests were separated by SDS-PAGE and visualized using SYPRO 
Ruby Protein Gel Stain (Invitrogen, $12000). The band with increased staining 
from the o-KG lane (corresponding to potential protein targets that are protected 
from proteolysis by the binding of «-KG) and the matching area of the control lane 
were excised, in-gel trypsin digested, and subjected to liquid chromatography-tandem 
mass spectrometry (LC-MS/MS) analysis as described previously’”*. Mass spectro- 
metry results were searched against the human Swissprot database (release 57.15) 
using Mascot version 2.3.0, with all peptides meeting a significance threshold of 
P<0.05. 

For target verification by DARTS with western blotting (Fig. 2b), HeLa cells 
were lysed in M-PER buffer (Thermo Scientific, 78501) with the addition of pro- 
tease inhibitors (Roche, 11836153001) and phosphatase inhibitors (50 mM NaF, 
10 mM B-glycerophosphate, 5 mM sodium pyrophosphate, 2 mM Na3VOu,). Chilled 
TNC buffer (50 mM Tris-HCl pH 8.0, 50 mM NaCl, 10 mM CaCl,) was added to 
the protein lysate, and protein concentration of the lysate was measured by the 
BCA Protein Assay kit (Pierce, 23227). The protein lysate was then incubated with 
vehicle control (H2O) or varying concentrations of x-KG for 3 h at room tempe- 
rature with shaking at 600 r.p.m. in an Eppendorf Thermomixer. Pronase (Roche, 
10165921001) digestions were performed for 20 min at room temperature, and 
stopped by adding SDS loading buffer and immediately heating at 70 °C for 10 min. 
Samples were subjected to SDS-PAGE on 4-12% Bis-Tris gradient gel (Invitrogen, 
NP0322BOX) and western blotted for ATP synthase subunits ATP5B (Sigma, AV48185), 
ATP5O (Abcam, ab91400) and ATP5A (Abcam, ab110273). Binding between «-KG 
and PHD-2 (encoded by EGLN1) (Cell Signaling, 4835), for which o&-KG is a co- 
substrate’, was confirmed by DARTS. GAPDH (Ambion, AM4300) was used as a 
negative control. 

For DARTS using C. elegans (Extended Data Fig. 2a), wild-type animals of various 
ages were grown on NGM/OP50 plates, washed four times with M9 buffer, and 
immediately placed in the —80 °C freezer. Animals were lysed in HEPES buffer 
(40 mM HEPES pH 8.0, 120 mM NaCl, 10% glycerol, 0.5% Triton X-100, 10 mM 
B-glycerophosphate, 50 mM NaF, 0.2mM Na3VO,, protease inhibitors (Roche, 
11836153001)) using Lysing Matrix C tubes (MP Biomedicals, 6912-100) and the 
FastPrep-24 (MP Biomedicals) high-speed bench-top homogenizer in the 4 °C room 
(disrupt worms for 20s at 6.5 ms_, rest on ice for 1 min; repeat twice). Lysed ani- 
mals were centrifuged at 14,000 r.p.m. for 10 min at 4 °C to pellet worm debris, and 
supernatant was collected for DARTS. Protein concentration was determined by 
BCA Protein Assay kit (Pierce, 23223). A worm lysate concentration of 1.13 pg pl * 
was used for the DARTS experiment. All steps were performed on ice or at 4°C to 
help prevent premature protein degradation. TNC buffer (50 mM Tris-HCl pH 8.0, 
50 mM NaCl, 10 mM CaCl.) was added to the worm lysates. Worm lysates were 
incubated with vehicle control (H,O) or «-KG for 1h on ice and then 50 min at 
room temperature. Pronase (Roche, 10165921001) digestions were performed for 
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30 min at room temperature and stopped by adding SDS loading buffer and heat- 
ing at 70 °C for 10 min. Samples were then subjected to SDS-PAGE on NuPAGE 
Novex 4-12% Bis-Tris gradient gels (Invitrogen, NP0322BOX), and western blot- 
ting was carried out with an antibody against ATP5B (Sigma, AV48185) that also 
recognizes ATP-2. 

Complex V activity assay. Complex V activity was assayed using the MitoTox 
OXPHOS Complex V Activity Kit (Abcam, ab109907). Vehicle (HO) or «-KG was 
mixed with the enzyme before the addition of phospholipids. In experiments using 
octyl -KG, vehicle (1% DMSO) or octyl «-KG was added with the phospholipids. 
Relative complex V activity was compared to vehicle. Oligomycin (Sigma, 04876) 
was used as a positive control for the assay. 

Isolation of mitochondria from mouse liver. Animal studies were performed 
under approved University of California, Los Angeles animal research protocols. 
Mitochondria from 3-month-old C57BL/6 mice were isolated as described”. Briefly, 
livers were extracted, minced at 4 °C in MSHE plus BSA (70 mM sucrose, 210 mM 
mannitol, 5 mM HEPES, 1 mM EGTA, and 0.5% fatty acid free BSA, pH 7.2), and 
rinsed several times to remove blood. All subsequent steps were performed on ice 
or at 4 °C. The tissue was disrupted in ten volumes of MSHE plus BSA with a glass 
Dounce homogenizer (5-6 strokes) and the homogenate was centrifuged at 800g 
for 10 min to remove tissue debris and nuclei. The supernatant was decanted through 
a cell strainer and centrifuged at 8,000g for 10 min. The dark mitochondrial pellet 
was resuspended in MSHE plus BSA and re-centrifuged at 8,000g for 10 min. The 
final mitochondrial pellets were used for various assays as described later. 
Submitochondrial particle ATPase assay. ATP hydrolysis by ATP synthase was 
measured using submitochondrial particles (see ref. 41 and references therein). 
Mitochondria were isolated from mouse liver as described earlier. The final mito- 
chondrial pellet was resuspended in buffer A (250 mM sucrose, 10 mM Tris-HCl, 
1mM ATP, 5mM MgCl and 0.1 mM EGTA, pH 7.4) at 10 pg wt, subjected to 
sonication on ice (Fisher Scientific Model 550 Sonic Dismembrator; medium power, 
alternating between 10 s intervals of sonication and resting on ice for a total of 60s 
of sonication), and then centrifuged at 18,000g for 10 min at 4 °C. The supernatant 
was collected and centrifuged at 100,000g for 45 min at 4 °C. The final pellet (sub- 
mitochondrial particles) was resuspended in buffer B (250 mM sucrose, 10 mM 
Tris-HCl and 0.02 mM EGTA, pH 7.4). 

The SMP ATPase activity was assayed using the Complex V Activity Buffer as 

described eariler. The production of ADP is coupled to the oxidation of NADH to 
NAD* through pyruvate kinase and lactate dehydrogenase. The addition of «-KG 
(up to 10 mM) did not affect the activity of pyruvate kinase or lactate dehydrogenase 
when external ADP was added. The absorbance decrease of NADH at 340nm 
correlates to ATPase activity. Submitochondrial particles (2.18 ng pl") were incu- 
bated with vehicle or «-KG for 90 min at room temperature before the addition of 
activity buffer, and then the absorbance decrease of NADH at 340 nm was mea- 
sured every 1 min for 1h. Oligomycin (Cell signaling, 9996) was used as a positive 
control for the assay. 
Assay for ATP levels. Normal human diploid fibroblast WI-38 (ATCC, CCL-75) 
cells were seeded in 96-well plates at 2 < 10° cells per well. Cells were treated with 
either DMSO (vehicle control) or octyl «-KG at varying concentrations for 2h in 
triplicate. ATP levels were measured using the CellTiter-Glo luminescent ATP assay 
(Promega, G7572); luminescence was read using Analyst HT (Molecular Devices). 
In parallel, identically treated cells were lysed in M-PER (Thermo Scientific, 78501) 
to obtain protein concentration by BCA Protein Assay kit (Pierce, 23223). ATP 
levels were normalized to protein content. Statistical analysis was performed using 
GraphPad Prism (unpaired t-test). 

The assay for ATP levels in C. elegans was carried out as follows. Synchronized 
day 1 adult wild-type C. elegans were placed on NGM plates containing either 
vehicle or 8 mM «-KG. On day 2 and 8 of adulthood, 9 replicates and 4 replicates, 
respectively, of about 100 worms were collected from «-KG or vehicle control plates, 
washed 4 times in M9 buffer, and frozen in — 80 °C. Animals were lysed using Lysing 
Matrix C tubes (MP Biomedicals, 6912-100) and the FastPrep-24 (MP Biomedicals) 
high-speed bench-top homogenizer (disrupt worms for 20s at 6.5ms ', rest on 
ice for 1 min; repeat twice). Lysed animals were centrifuged at 14,000 r.p.m. for 
10 min at 4 °C to pellet worm debris, and supernatant was saved for ATP quanti- 
fication using the Kinase-Glo Luminescent Kinase Assay Platform (Promega, V6713) 
according to the manufacturer’s instructions. The assay was performed in white 
opaque 96-well tissue culture plates (Falcon, 353296), and luminescence was mea- 
sured using Analyst HT (Molecular Devices). ATP levels were normalized to the 
number of worms. Statistical analysis was performed using Microsoft Excel (t-test, 
two-tailed, two-sample unequal variance). 

Measurement of oxygen consumption rates. Oxygen consumption rate (OCR) 
measurements were made using a Seahorse XF-24 analyser (Seahorse Bioscience)”. 
Cells were seeded in Seahorse XF-24 cell culture microplates at 50,000 cells per well 
in DMEM media supplemented with 10% FBS and 10 mM glucose, and incubated 
at 37 °C and 5% CO; overnight. Treatment with octyl «-KG or DMSO (vehicle control) 
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was for 1h. Cells were washed in unbuffered DMEM medium (pH 7.4, 10 mM 
glucose) just before measurement, and maintained in this buffer with indicated 
concentrations of octyl «-KG. OCR was measured three times under basal condi- 
tions and normalized to protein concentration per well. Statistical analysis was per- 
formed using GraphPad Prism. 

Measurement of OCR in living C. elegans was carried out as follows. The pro- 

tocol was adapted from those previously described*“*. Wild-type day 1 adult N2 
worms were placed on NGM plates containing 8 mM «-KG or H,O (vehicle con- 
trol) seeded with OP50 or HT115 E. coli. OCR was assessed on day 2 of adulthood. 
On day 2 of adulthood, worms were collected and washed four times with M9 to 
rid the samples of bacteria (we further verified that «-KG does not affect oxygen 
consumption of the bacteria—therefore, even if there were any leftover bacteria 
after the washes, the changes in OCR observed would still be worm specific), and 
then the animals were seeded in quadruplicates in Seahorse XF-24 cell culture 
microplates (Seahorse Bioscience, V7-PS) in 200 pl M9 at ~200 worms per well. 
Oxygen consumption rates were measured seven times under basal conditions and 
normalized to the number of worms counted per well. The experiment was repeated 
twice. Statistical analysis was performed using Microsoft Excel (t-test, two-tailed, 
two-sample unequal variance). 
Measurement of mitochondrial respiratory control ratio. Mitochondrial res- 
piratory control ratio (RCR) was analysed using isolated mouse liver mitochondria 
(see ref. 15 and references therein). Mitochondria were isolated from mouse liver 
as described earlier. The final mitochondrial pellet was resuspended in 30 ul of 
MAS buffer (70 mM sucrose, 220 mM mannitol, 10 mM KH2PO,, 5mM MgCh, 
2mM HEPES, 1 mM EGTA, and 0.2% fatty acid free BSA, pH 7.2). 

Isolated mitochondrial respiration was measured by running coupling and elec- 
tron flow assays as described”. For the coupling assay, 20 t1g of mitochondria in 
complete MAS buffer (MAS buffer supplemented with 10 mM succinate and 2 1M 
rotenone) were seeded into a XF24 Seahorse plate by centrifugation at 2,000g for 
20 minat 4 °C. Just before the assay, the mitochondria were supplemented with com- 
plete MAS buffer for a total of 500 jl (with 1% DMSO or octyl o-KG), and warmed at 
37 °C for 30 min before starting the OCR measurements. Mitochondrial respiration 
begins in a coupled state 2; state 3 is initiated by 2 mM ADP; state 40 (oligomycin 
insensitive, that is, complex V independent) is induced by 2.5 .M oligomycin; and 
state 3u (FCCP-uncoupled maximal respiratory capacity) by 4 uM FCCP. Finally, 
1.5 pg ml! antimycin A was injected at the end of the assay. The state 3/state 40 
ratio gives the RCR. 

For the electron flow assay, the MAS buffer was supplemented with 10 mM sodium 

pyruvate (complex I substrate), 2mM malate (complex II inhibitor) and 44M 
FCCP, and the mitochondria are seeded the same way as described for the coupling 
assay. After basal readings, the sequential injections were as follows: 2 1M rotenone 
(complex I inhibitor), 10 mM succinate (complex II substrate), 4 |1M antimycin A 
(complex III inhibitor) and 10 mM/100 tM ascorbate/tetramethylphenylenedia- 
mine (complex IV substrate). 
ATP synthase enzyme inhibition kinetics. ATP synthesis enzyme inhibition kinetic 
analysis was performed using isolated mitochondria. Mitochondria were isolated 
from mouse liver as described earlier. The final mitochondrial pellet was resus- 
pended in MAS buffer supplemented with 5 mM sodium ascorbate (Sigma, A7631) 
and 5 mM TMPD (Sigma, T7394). 

The reaction was carried out in MAS buffer containing 5mM sodium ascorbate, 
5mM TMPD, luciferase reagent (Roche, 11699695001), octanol or octyl «-KG, var- 
iable amounts of ADP (Sigma, A2754), and 3.75 ng pl’ mitochondria. ATP syn- 
thesis was monitored by the increase in luminescence over time by a luminometer 
(Analyst HT, Molecular Devices). ATP-synthase-independent ATP formation, 
derived from the oligomycin-insensitive luminescence, was subtracted as back- 
ground. The initial velocity of ATP synthesis was calculated from the slope of the 
first 3 min of the reaction, before the velocity begins to decrease. Enzyme inhibition 
kinetics was analysed by nonlinear regression least-squares fit using GraphPad 
Prism. 

Assay for mammalian TOR pathway activity. Mammalian (m)TOR pathway 
activity in cells treated with octyl «-KG or oligomycin was determined by the levels 
of phosphorylation of known mTOR substrates, including S6K (T389), 4E-BP1 
(S65), AKT (S473) and ULK1 (S757)*"™. Specific antibodies used: phospho (P)- 
S6K T389 (Cell Signaling, 9234), S6K (Cell Signaling, 9202S), P-4E-BP1 S65 (Cell 
Signaling, 9451S), 4E-BP1 (Cell Signaling, 9452S), P-AKT S473 (Cell Signaling, 4060S), 
AKT (Cell Signaling, 4691S), P-ULK1 S757 (Cell Signaling, 6888), ULK1 (Cell 
Signaling, 4773S) and GAPDH (Santa Cruz Biotechnology, 25778). 

Assay for autophagy. DA2123 animals carrying an integrated GFP::LGG-1 trans- 
lational fusion gene*®*’, were used to quantify levels of autophagy. To obtain a syn- 
chronized population of DA2123, we performed an egg preparation of gravid adults 
(by lysing ~ 100 gravid worms in 70 pul M9 buffer, 25 pil bleach and 5 pl 10 N NaOH) 
and allowed the eggs to hatch overnight in M9, causing starvation-induced L1 
diapause. L1 larvae were deposited onto NGM treatment plates containing vehicle, 
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8mM«-KG or 40 uM oligomycin, and seeded with either E. coliOP50, HT115(DE3) 
with an empty vector, or HT115(DE3)-expressing dsRNAs targeting atp-2, let-363 or 
ogdh-1 as indicated. When the majority of animals in a given sample first reached 
the mid-L3 stage, individual L3 larvae were mounted onto microscope slides and 
anaesthetized with 1.6 mM levamisole (Sigma, 31742). Nematodes were observed 
using an Axiovert 200M Zeiss confocal microscope with a LSM5 Pascal laser, and 
images were captured using the LSM Image Examiner (Zeiss). For each specimen, 
GEP::LGG-1 puncta (autophagosomes) in the epidermis, including the lateral seam 
cells and Hyp7, were counted in three separate regions of 140.97 mm? using ‘ana- 
lyze particles’ in Image]. Measurements were made blind to both the genotype 
and supplement. Statistical analysis was performed using Microsoft Excel (t-test, 
two-tailed, two-sample unequal variance). 

The assay for autophagy in mammalian cells was carried out as follows. HEK- 
293 cells were seeded in 6-well plates at 2.5 X 10° cells per well in DMEM media 
supplemented with 10% FBS and 10 mM glucose, and incubated overnight before 
treatment with either octanol (vehicle control) or octyl «-KG for 72 h. Cells were 
lysed in M-PER buffer with protease and phosphatase inhibitors. Lysates were 
subjected to SDS-PAGE ona 4-12% Bis-Tris gradient gel with MES running buffer 
and western blotted for LC3 (Novus, NB100-2220). LC3 is the mammalian homo- 
logue of worm LGG-1, and conversion of the soluble LC3-I to the lipidated LC3-II is 
activated in autophagy, for example, upon starvation™. 

Pharyngeal pumping rates of C. elegans treated with 8 mM a-KG. The pha- 
ryngeal pumping rates of 20 wild-type N2 worms per condition were assessed. 
Pharyngeal contractions were recorded for 1 min using a Zeiss M2 BioDiscovery 
microscope and an attached Sony NDR-XR500V video camera at 12-fold optical 
zoom. The resulting videos were played back at 0.3 speed using MPlayerX and 
pharyngeal pumps were counted. Statistical analysis was performed using Micro- 
soft Excel (t-test, two-tailed, two-sample unequal variance). 

Assay for a-KG levels in C. elegans. Synchronized adult worms were collected 
from plates with vehicle (HO) or 8 mM «-KG, washed three times with M9 buffer, 
and flash frozen. Worms were lysed in M9 using Lysing Matrix C tubes (MP Bio- 
medicals, 6912-100) and the FastPrep-24 (MP Biomedicals) high-speed bench-top 
homogenizer in the 4°C room (disrupt worms for 20s at 6.5ms_ 1 rest on ice for 
1 min; repeat three times). Lysed animals were centrifuged at 14,000 r.p.m. for 10 min 
at 4°C to pellet worm debris, and the supernatant was saved. The protein concen- 
tration of the supernatant was determined by the BCA Protein Assay kit (Pierce, 
23223); there was no difference in protein level per worm in o-KG-treated and 
vehicle-treated animals (data not shown). o-KG content was assessed as described 
previously” with modifications. Worm lysates were incubated at 37 °C in 100 mM 
KH>PO, (pH 7.2), 10 mM NH,Cl, 5mM MgCl and 0.3 mM NADH for 10 min. 
Glutamate dehydrogenase (Sigma, G2501) was then added to reach a final concen- 
tration of 1.83 units ml‘. Under these conditions, glutamate dehydrogenase uses 
a-KG and NADH to make glutamate. The absorbance decrease was monitored at 
340 nm. The intracellular level of o-KG was determined from the absorbance decrease 
in NADH. The approximate molarity of «-KG present inside the animals was esti- 
mated using average protein content (~245 ng per worm, from BCA assay) and 
volume (~3 nl for adult worms 1.1 mm in length and 60 jum in diameter (http:// 
www.wormatlas.org/hermaphrodite/introduction/Introframeset.html)). 

For quantitative analysis of x-KG in worms using ultra-high-performance liquid 
chromatography-electrospray ionization-tandem mass spectrometry (UHPLC-ESI/ 
MS/MS), synchronized day 1 adult worms were placed on vehicle plates with or 
without bacteria for 24h, and then collected and lysed in the same manner as 
described earlier. #-KG analysis by LC/MS/MS was carried out on an Agilent 1290 
Infinity UHPLC system and 6460 Triple Quadrupole mass spectrometer (Agilent 
Technologies) using an electrospray ionization (ESI) source with Agilent Jet Stream 
technology. Data were acquired with Agilent MassHunter Data Acquisition soft- 
ware version B.06.00, and processed for precursor and product ions selection with 
MassHunter Qualitative Analysis software version B.06.00 and for calibration and 
quantification with MassHunter Quantitative Analysis for QQQ software version 
B.06.00. 

For UHPLC, 3 pl calibration standards and samples were injected onto the 
UHPLC system including a G4220A binary pump with a built-in vacuum degasser 
and a thermostatted G4226A high performance autosampler. An ACQUITY UPLC 
BEH Amide analytical column (2.1 X 50 mm, 1.7 jum) anda VanGuard BEH Amide 
Pre-column (2.1 X 5 mm, 1.7 tum) from Waters Corporation were used at the flow 
rate of 0.6 ml min‘ using 50/50/0.04 acetonitrile/water/ammonium hydroxide 
with 10 mM ammonium acetate as mobile phase A and 95/5/0.04 acetonitrile/ 
water/ammonium hydroxide with 10 mM ammonium acetate as mobile phase B. 
The column was maintained at room temperature. The following gradient was 
applied: 0-0.41 min: 100% B isocratic; 0.41-5.30 min: 100-30% B; 5.30-5.35 min: 
30-0% B; 5.35-7.35 min: 0% B isocratic; 7.35-7.55 min: 0- 100% B; 7.55-9.55 min: 
100% B isocratic. 


For the MS detection, the ESI mass spectra data were recorded on a negative 
ionization mode by MRM. MRM transitions of «-KG and its ISTD °C4-a-KG 
(Cambridge Isotope Laboratories) were determined using a 1 min 37% B isocratic 
UHPLC method through the column at a flow rate of 0.6 ml min”. The precursor 
ion of [M-H] and the product ion of [M-CO,-H] were observed to have the 
highest signal-to-noise ratios. The precursor and product ions are respectively 145.0 
and 100.9 for “-KG, and 149.0 and 104.9 for ISTD 'C,-o-KG. Nitrogen was used as 
the drying, sheath and collision gas. All the source and analyser parameters were 
optimized using Agilent MassHunter Source and iFunnel Optimizer and Optimizer 
software, respectively. The source parameters are as follows: drying gas temperature 
120°C, drying gas flow 131 min“ ', nebulizer pressure 55 psi, sheath gas tempera- 
ture 400 °C, sheath gas flow 121 min — - capillary voltage 2,000 V, and nozzle voltage 
OV. The analyser parameters are as follows: fragmentor voltage 55 V, collision 
energy 2 V and cell accelerator voltage 1 V. The UHPLC eluants before 1 min and 
after 5.3 min were diverted to waste. 

Membrane-permeable esters of a-KG. Octyl a-KG, a commonly used membrane- 
permeable ester of «-KG****, was used to deliver «-KG across lipid membranes in 
experiments using cells and mitochondria. Upon hydrolysis by cellular esterases, 
octyl «-KG yields «-KG and the by-product octanol. We showed that, whereas 
octanol control has no effect (Extended Data Fig. 2e, f and Extended Data Fig. 6a), 
a-KG alone can bind and inhibit ATP synthase (Fig. 2a, b and Extended Data Fig. 2a, b; 
data not shown), decrease ATP and OCR (Fig. 2e, g), induce autophagy (Fig. 4b) 
and increase C. elegans lifespan (Figs 1, 3, Extended Data Figs 1, 5 and Extended 
Data Table 2). The existence and activity of esterases in our mitochondrial and cell 
culture experiments have been confirmed using calcein AM (C1430, Molecular 
Probes), an esterase substrate that fluoresces upon hydrolysis, and also by mass 
spectrometry (data not shown). The hydrolysis by esterases explains why distinct 
esters of o-KG, such as 1-octyl a-KG, 5-octyl «-KG, and dimethyl «-KG, have 
similar effects to «-KG (Extended Data Fig. 2g, h and Extended Data Table 2). 
Synthesis of octyl a-KG. Synthesis of 1-octyl «-KG has been previously described”. 
Briefly, 1-octanol (0.95 ml, 6.0 mmol), DMAP (37 mg, 0.3 mmol) and DCC (0.743 g, 
3.6 mmol) were added to a solution of 1-cyclobutene-1-carboxylic acid (0.295 g, 
3.0 mmol) in dry CH;Cl, (6.0 ml) at 0 °C. After it had been stirred for 1 h, the solu- 
tion was allowed to warm to room temperature and stirred for another 8h. The 
precipitate was filtered and washed with ethyl acetate (3 X 100 ml). The combined 
organic phases were washed with water and brine, and dried over anhydrous Na,SOy. 
Flash column chromatography on silica gel eluting with 80/1 hexane/ethy] acetate 
gave octyl cyclobut-1-enecarboxylate as a clear oil (0.604 g, 96%). To a —78 °C 
solution of this oil (0.211 g, 1.0 mmol) in CH2Cl, (10 ml) was bubbled O;/O, until 
the solution turned blue. The residual ozone was discharged by bubbling with O, 
and the reaction was warmed to room temperature and stirred for another 1h. 
Dimethyl sulphide (Me.S, 0.11 ml, 1.5 mmol) was added to the mixture and it was 
stirred for another 2 h. The CH,Cl, was removed in vacuo and the crude product 
was dissolved in a solution of 2-methyl-2-butene (0.8 ml) in #BuOH (3.0 ml). To 
this was added dropwise a solution containing sodium chlorite (0.147 g, 1.3 mmol) 
and sodium dihydrogen phosphate monohydrate (0.179 g, 1.3 mmol) in HO (1.0 ml). 
The mixture was stirred at room temperature overnight, and then extracted with 
ethyl acetate (3 X 50 ml). The combined organic phases were washed with water 
and brine, and dried over anhydrous Na,SO,. Flash column chromatography on 
silica gel eluting with 5/1 hexane/ethyl acetate gave octyl o-KG, which became a 
pale solid when stored in the refrigerator (0.216 g, 84%). 

Synthesis of 5-octyl L-glutamate. L-Glutamic acid (0.147 g, 1.0 mmol) and anhy- 
drous sodium sulphate (0.1 g) was dissolved in octanol (2.0 ml), and then tetrafluo- 
roboric acid-dimethyl ether complex (0.17 ml) was added. The suspended mixture 
was stirred at 21°C overnight. Anhydrous THF (5 ml) was added to the mixture 
and it was filtered through a thick pad of activated charcoal. Anhydrous triethy- 
lamine (0.4 ml) was added to the clear filtrate to obtain a milky white slurry. Upon 
trituration with ethyl acetate (10 ml), the monoester monoacid precipitated. The 
precipitate was collected, washed with additional ethyl acetate (2 X 5 ml), and dried 
in vacuo to give the desired product, 5-octyl L-glutamate (0.249 g, 96%) as a white 
solid. 'H NMR (500 MHz, acetic acid-d,): 6 4.12 (dd, J = 6.6, 6.6 Hz, 1H), 4.11 
(t, J = 6.8 Hz, 2H), 2.64 (m, 2H), 2.26 (m, 2H), 1.64 (m, 2H), 1.30 (m, 10H), 0.89 
(t, J = 7.0 Hz, 3H). °C NMR (125 MHz, acetic acid-dy): 6 175.0, 174.3, 66.3, 55.0, 
32.7, 30.9, 30.11, 30.08, 29.3, 26.7, 26.3, 23.4, 14.4. 

Synthesis of 5-octyl D-glutamate. The synthesis of the opposite enantiomer, that 
is, 5-octyl p-glutamate, was carried out by the exact same procedure starting with 
p-glutamic acid. The spectroscopic data was identical to that of the enantiomeric 
compound. 

Synthesis of 5-octyl a-KG. 1-Benzyl 5-octyl 2-oxopentanedioate was obtained as 
follows. To a solution of 5-octyl L-glutamate (0.249 g) in HO (6.0 ml) and acetic 
acid (2.0 ml) cooled to 0 °C was added slowly a solution of aqueous sodium nitrite 
(0.207 g, 3.0 mmol in 4 ml H,0). The reaction mixture was allowed to warm slowly 
to room temperature and was stirred overnight. The mixture was concentrated. 
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The resulting residue was dissolved in DMF (10 ml) and NaHCO; (0.42 g, 5.0 mmol) 
and benzyl bromide (0.242 ml, 2.0 mmol) were added to the mixture. The mixture 
was stirred at 21 °C overnight and then extracted with ethyl acetate (3 X 30 ml). The 
combined organic phase was washed with water and brine and dried over anhy- 
drous MgSO,. Flash column chromatography on silica gel eluting with 7/1 hexanes/ 
ethyl acetate gave the mixed diester 1-benzyl 5-octyl (S)-2-hydroxypentanedioate 
as a colourless oil. To this oil, dissolved in dichloromethane (10.0 ml), were added 
NaHCO; (0.42 g, 5.0 mmol) and Dess—Martin periodinane (0.509 g, 1.2 mmol), and 
the mixture was stirred at room temperature for 1 h and then extracted with ethyl 
acetate (3 X 30 ml). The combined organic phase was washed with water and brine 
and dried over anhydrous MgSO,. Flash column chromatography on silica gel eluting 
with 5/1 hexanes/ethyl acetate gave the desired 1-benzyl 5-octyl 2-oxopentanedioate 
(0.22 g, 66%) as a white solid. 'H NMR (500 MHz, CDCI,): 5 7.38 (m, 5H), 5.27 
(s, 2H), 4.05 (t, J = 6.5 Hz, 2H), 3.14 (t, J = 6.5 Hz, 2H), 2.64 (t, J = 6.5 Hz, 2H), 
1.59 (m, 2H), 1.28 (m, 10H), 0.87 (t, J = 7.0 Hz, 3H). *C NMR (125 MHz, CDCl;): 
6 192.2, 171.9, 160.1, 134.3, 128.7, 128.6, 128.5, 67.9, 65.0, 34.2, 31.7, 29.07, 29.05, 
28.4, 27.5, 25.7, 22.5, 14.0. 

5-Octyl a-KG (5-(octyloxy)-2,5-dioxopentanoic acid) was obtained as follows. 
Toa solution of 1-benzyl 5-octyl 2-oxopentanedioate (0.12 g, 0.344 mmol) in ethyl 
acetate (15 ml) was added 5% Pd/C (80 mg). Over the mixture was passed a stream 
of argon and then the argon was replaced with hydrogen gas and the mixture was 
stirred vigorously for 15 min. The mixture was filtered through a thick pad of Celite 
to give the desired product 5-octyl o-KG (0.088 g, 99%) as white solid. 'H NMR 
(500 MHz, CDCl,): 6 8.16 (br s, 1H), 4.06 (t, J = 6.5 Hz, 2H), 3.18 (t, J = 6.5 Hz, 
2H), 2.69 (t, J = 6.0 Hz, 2H), 1.59 (m, 2H), 1.26 (m, 10H), 0.85 (t, J = 7.0 Hz, 3H). 
'3C NMR (125 MHz, CDCI): 6 193.8, 172.7, 160.5, 65.5, 33.0, 31.7, 29.08, 29.06, 
28.4, 27.8, 25.8, 22.5, 14.0. 
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Extended Data Figure 1 | Supplementation with a-KG extends C. elegans 
adult lifespan but does not change the growth rate of bacteria, or food 
intake, pharyngeal pumping rate or brood size of the worms. a, Robust 
lifespan extension in adult C. elegans by a-KG. 8 mM «-KG increased the mean 
lifespan of N2 by an average of 47.3% in three independent experiments 
(P< 0.0001 for every experiment, by log-rank test). Experiment 1, mean 
lifespan (days of adulthood) with vehicle treatment (m,.,) = 18.9 (n = 87 
animals tested), my.xG = 25.8 (n = 96); experiment 2, myeh = 17.5 (n = 119), 
My.KG = 254 (n = 97); experiment 3, mye, = 16.3 (n = 100), myKG = 26.1 

(n = 104). b, Worms supplemented with 8 mM «-KG and worms with RNAi 
knockdown of «-KGDH (encoded by ogdh-1) have increased «-KG levels. 
Young adult worms were placed on treatment plates seeded with control 
HT115 E. coli or HT115-expressing ogdh-1 dsRNA, and o-KG content was 
assayed after 24h (see Methods). c, a-KG treatment beginning at the egg stage 
and that beginning in adulthood produced identical lifespan increases. Light 
red, treatment with vehicle control throughout larval and adult stages 

(m = 15.6, n = 95); dark red, treatment with vehicle during larval stages and 
with 8 mM &-KG at adulthood (m = 26.3, n = 102), P< 0.0001 (log-rank test); 
orange, treatment with 8 mM a-KG throughout larval and adult stages 

(m = 26.3, n = 102), P< 0.0001 (log-rank test). d, «-KG does not alter the 
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Day2 Day3 Day4 Day5 Day6 Day7 Day8 Total 
growth rate of the OP50 E. coli, which is the standard laboratory food source for 
nematodes. a-KG (8 mM) or vehicle (HO) was added to standard LB media 
and the pH was adjusted to 6.6 by the addition of NaOH. Bacterial cells from the 
same overnight OP50 culture were added to the LB + o-KG mixture at a 1:40 
dilution, and then placed in the 37 °C incubator shaker at 300 r.p.m. The 
absorbance at 595 nm was read at 1h time intervals to generate the growth 
curve. e, Schematic representation of food preference assay. f, N2 worms show 
no preference between OP50 E. coli food treated with vehicle or «-KG 

(P = 0.85, by t-test, two-tailed, two-sample unequal variance), nor preference 
between identically treated OP50 E. coli. g, Pharyngeal pumping rate of 

C. elegans on 8mM «-KG is not significantly altered (by t-test, two-tailed, 
two-sample unequal variance). h, Brood size of C. elegans treated with 8 mM 
a-KG. Brood size analysis was conducted at 20 °C. Ten L4 wild-type worms 
were each singly placed onto an NGM plate containing vehicle or 8 mM o-KG. 
Worms were transferred one per plate onto a new plate every day, and the eggs 
laid were allowed to hatch and develop on the previous plate. Hatchlings 
were counted as a vacuum was used to remove them from the plate. Animals on 
8mM «-KG showed no significant difference in brood size compared with 
animals on vehicle plates (P = 0.223, by t-test, two-tailed, two-sample unequal 
variance). Mean = s.d. is plotted in all cases. 


©2014 Macmillan Publishers Limited. All rights reserved 


Pronase ie) 1:50 1:100 1:300 
a-KG (UM) 0 0 100 500 0 100 500 0 100 500 
60 2 
‘ 
50 a asta 
40 
30 Ric’ 
20 —_ —_— 
15 » 
i es 
7 8 
— 
g 6 
= —®— Control 
= —=— atp-2 RNAi 
© 4 
£ 
xe) 
5 
: 2 
0 T T T T 1 
0 10 20 30 40 50 
Time (min) 
e 
oo e@pmso 4P Oligo FCCRtate alf\A 
State 3 
= 94 | @ Octanol 
ee 
Ee 
35 684 
ES 
oo Rot 
enone 
a 2 41 Succinate 
(e) 
15 4 
-114 1 1 1 1 1 
0 6 13 19 26 32 
Time (min) 
g 
43 JeVehicle ADP Oligo FCCP AA 
33 @1-Octyl a-KG 
e  ~~ |@5-Cctyl a-KG 
en 
35 244 
ES 
2a 
% 9 144 
(e) 
44 
-6 T T T T T 7 
0 6 13 19 26 32 
Time (min) 


Extended Data Figure 2 | a-KG binds to the B subunit of ATP synthase and 
inhibits the activity of complex V but not the other ETC complexes. 

a, Western blot showing protection of the ATP-2 protein from Pronase 
digestion upon o-KG binding in the DARTS assay. The antibody for human 
ATPS5B (Sigma, AV48185) recognizes the epitope , 44IMNVIGEPIDERGPIKT 
KQFAPIHAEAPEFMEMSVEQEILVTGIKVVDLL)p; that has 90% identity to 
the C. elegans ATP-2. The lower molecular weight band near 20 kDa is a 
proteolytic fragment of the full-length protein corresponding to the domain 
directly bound by «-KG. b, «-KG does not affect complex IV activity. Complex 
IV activity was assayed using the MitoTox OXPHOS Complex IV Activity Kit 
(Abcam, ab109906). Relative complex IV activity was compared to vehicle 
(HO) controls. Potassium cyanide (Sigma, 60178) was used as a positive 
control for the assay. Complex V activity was assayed using the MitoTox 
Complex V OXPHOS Activity Microplate Assay (Abcam, ab109907). c, atp-2 
RNAi worms have lower oxygen consumption compared to control (gfp in 
RNAi vector), P< 0.0001 (t-test, two-tailed, two-sample unequal variance) for 
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the entire time series (two independent experiments); similar to x-KG-treated 
worms shown in Fig. 2g. d, -KG does not affect the electron flow through 
the ETC. Oxygen consumption rate (OCR) from isolated mouse liver 
mitochondria at basal (pyruvate and malate as complex I substrate and 
complex II inhibitor, respectively, in the presence of FCCP) and in response 
to sequential injection of rotenone (Rote; complex I inhibitor), succinate 
(Succ; complex II substrate), antimycin A (AA; complex III inhibitor), 
ascorbate/tetramethylphenylenediamine (Asc/TMPD; cytochrome c (complex 
IV) substrate). No difference in complex I (C I), complex II (C II) or complex 
IV (CIV) respiration was observed after 30 min treatment with 800 1M 
octyl «-KG, whereas complex V was inhibited (see Fig. 2h) by the same 
treatment (two independent experiments). e, f, No significant difference in 
coupling (e) or electron flow (f) was observed with either octanol or DMSO 
vehicle control. g, h, Treatment with 1-octyl «-KG or 5-octyl «-KG gave 
identical results in coupling (g) or electron flow (h) assays. Mean = s.d. is 
plotted in all cases. 
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Extended Data Figure 3 | Treatment with oligomycin extends C. elegans 
lifespan and enhances autophagy in a manner dependent on let-363. 

a, Oligomycin extends the lifespan of adult C. elegans in a concentration- 
dependent manner. Treatment with oligomycin began at the young adult 
stage. 40 uM oligomycin increased the mean lifespan of N2 worms by 32.3% 
(P < 0.0001, by log-rank test); see Extended Data Table 2 for details. 

b, Confocal images of GFP::LGG-1 puncta in L3 epidermis of C. elegans 
with vehicle, oligomycin (40 1M) or a-KG (8 mM), and number of GFP:: 
LGG-1-containing puncta quantified using ImageJ. Bars indicate the mean. 
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Autophagy in C. elegans treated with oligomycin or o-KG is significantly higher 
than in vehicle-treated control animals (t-test, two-tailed, two-sample unequal 
variance). ¢, There is no significant difference (NS) between control worms 
treated with oligomycin and let-363 RNAi worms treated with vehicle, nor 
between vehicle- and «-KG-treated let-363 RNAi worms, consistent with 
independent experiments in Fig. 4b, c; also, oligomycin does not augment 
autophagy in let-363 RNAi worms (if anything, there may be a small decrease, 
as indicated by an asterisk); by t-test, two-tailed, two-sample unequal variance. 
Bars indicate the mean. Photographs were taken at X 100 magnification. 
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Extended Data Figure 4| Analyses of oxidative stress in worms treated 
with a-KG or atp-2 RNAi. a, The atp-2 RNAi worms have higher levels of 
2',7'-dichlorofluorescein (DCF) fluorescence than gfp control worms 

(P< 0.0001, by t-test, two-tailed, two-sample unequal variance). 
Supplementation with «-KG also leads to higher DCF fluorescence, in both 
HT115- (for RNAi) and OP50-fed worms (P = 0.0007 and P = 0.0012, 
respectively). Reactive oxygen species (ROS) levels were measured using 2’, 
7'-dichlorodihydrofluorescein diacetate (H2DCF-DA). As whole worm lysates 
were used, total cellular oxidative stress was measured here. H3DCF-DA 
(Molecular Probes, D399) was dissolved in ethanol to a stock concentration 
of 1.5mg ml '. Fresh stock was prepared every time before use. For measuring 
ROS in worm lysates, a working concentration of H,DCF-DA at 30ngml ! 
was hydrolysed by 0.1 M NaOH at room temperature for 30 min to generate 
2',7'-dichlorodihydrofluorescein (DCFH) before mixing with whole worm 
lysates in a black 96-well plate (Greiner Bio-One). Oxidation of DCFH by ROS 
yields the highly fluorescent DCF. DCF fluorescence was read at excitation/ 
emission of 485/530 nm using SpectraMax MS (Molecular Devices). HO, was 
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used as positive control (data not shown). To prepare the worm lysates, 
synchronized young adult animals were cultivated on plates containing vehicle 
or 8mM o@-KG and OP50 or HT115 E. coli for 1 day, and then collected and 
lysed as described in Methods. Mean + s.d. is plotted. b, There was no 
significant change in protein oxidation upon «-KG treatment or atp-2 RNAi. 
Oxidized protein levels were determined by OxyBlot. Synchronized young 
adult N2 animals were placed onto plates containing vehicle or 8 mM «-KG, 
and seeded with OP50 or HT115 bacteria that expressed control or atp-2 
dsRNA. Adult day 2 and day 3 worms were collected and washed four times 
with M9 buffer, and then stored at —80 °C for at least 24h. Laemmli buffer 
(Biorad, 161-0737) was added to every sample and animals were lysed by 
alternate boil/freeze cycles. Lysed animals were centrifuged at 14,000 r.p.m. for 
10 min at 4 °C to pellet worm debris, and supernatant was collected for OxyBlot 
analysis. Protein concentration of samples was determined by the 660 nm 
Protein Assay (Thermo Scientific, 1861426) and normalized for all samples. 
Carbonylation of proteins in each sample was detected using the OxyBlot 
Protein Oxidation Detection Kit (Millipore, $7150). 
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Extended Data Figure 5 | Lifespan extension by a-KG in the absence of 


aak-2, daf-16, hif-1, vhl-1 or egl-9. a, Lifespans of x-KG-supplemented N2 


WOrMS, Myeh = 17.5 (n = 119), My-Ko = 25.4 (n = 97), P< 0.0001; or 
aak-2(0k524) mutants, m,., = 13.7 (n = 85), My-Kg = 17.1 (n = 83), 
P<0.0001. b, N2 worms fed gfp RNAi control, mye, = 18.5 (n = 101), 
My-KG = 23.1 (n = 98), P< 0.0001; or daf-16 RNAi, mycn = 14.3 (n = 99), 
My-KG = 17.6 (n = 99), P< 0.0001. c, N2 worms, yen = 21.5 (n = 101), 


My.KG = 24.6 (n = 102), P< 0.0001; hif-1(ia7) mutants, mye, = 19.6 (n = 102), 


My-KG = 23.6 (n = 101), P< 0.0001; vhl-1(ok161) mutants, mye, = 20.0 
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(n = 98), My-KG = 24.9 (n = 100), P< 0.0001; or egl-9(sa307) mutants, 

Myeh = 16.2 (n = 97), My-KG = 25.6 (n = 96), P< 0.0001. P values were 
determined by the log-rank test. Number of independent experiments: N2 (8), 
hif-1 (5), vhl-1 (1) and egl-9 (2); see Extended Data Table 2 for details. Two 
different hif-1 mutant alleles” have been used: ia4 (shown in Fig. 3g) is a 
deletion over several introns and exons; ia7 (shown here) is an early stop codon, 
causing a truncated protein. Both alleles have the same effect on lifespan”’. 
We tested both alleles for «%-KG longevity and obtained the same results. 
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Extended Data Figure 6 | a-KG decreases TOR pathway activity but does 
not directly interact with TOR. a, Phosphorylation of S6K (T389) was 
decreased in U87 cells treated with octyl #-KG, but not in cells treated with 
octanol control. The same results were obtained using HEK-293 and MEF cells. 
b, Phosphorylation of AMPK(T172) is upregulated in WI-38 cells upon 
complex V inhibition by 4-KG, consistent with decreased ATP content in 
a-KG-treated cells and animals. However, this activation of AMPK appears to 
require more severe complex V inhibition than the inactivation of mammalian 
TOR, as either oligomycin or a higher concentration of octyl «%-KG was 
required for increasing phospho (P)-AMPK whereas concentrations of octyl 
a-KG comparable to those that decreased cellular ATP content (Fig. 2d) or 
oxygen consumption (Fig. 2f) were also sufficient for decreasing P-S6K. The 
same results were obtained using U87 cells. Samples were subjected to 
SDS-PAGE on 4-12% Bis-Tris gradient gel (Invitrogen, NP0322BOX) and 
western blotted with specific antibodies against PPAMPK T172 (Cell Signaling, 
2535S) and AMPK (Cell Signaling, 2603S). c, a-KG still induces autophagy in 
aak-2 RNAi worms; **P < 0.01 (t-test, two-tailed, two-sample unequal 
variance). The number of GFP::LGG-1 containing puncta was quantified using 
ImageJ. Bars indicate the mean. d, e, e-KG does not bind to TOR directly as 
determined by DARTS. HEK-293 (d) or HeLa (e) cells were lysed in M-PER 
buffer (Thermo Scientific, 78501) with the addition of protease inhibitors 
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(Roche, 11836153001) and phosphatase inhibitors (50 mM NaF, 10 mM 
B-glycerophosphate, 5 mM sodium pyrophosphate, 2 mM Na3VO,). Protein 
concentration of the lysate was measured by BCA Protein Assay kit (Pierce, 
23227). Chilled TNC buffer (50 mM Tris-HCl pH 8.0, 50 mM NaCl, 10 mM 
CaCl.) was added to the protein lysate, and the protein lysate was then 
incubated with vehicle control (DMSO) or varying concentrations of «-KG for 
1h (d) or 3h (e) at room temperature. Pronase (Roche, 10165921001) 
digestions were performed for 20 min at room temperature, and stopped by 
adding SDS loading buffer and immediately heating at 95 °C for 5 min (d) or 
70 °C for 10 min (e). Samples were subjected to SDS-PAGE on 4-12% Bis-Tris 
gradient gel (Invitrogen, NP0322BOX) and western blotted with specific 
antibodies against ATP5B (Santa Cruz, sc58618), mammalian TOR (Cell 
Signaling, 2972) or GAPDH (Ambion, AM4300). Image] was used to quantify 
the mammalian TOR/GAPDH and ATP5B/GAPDH ratios. Susceptibility of 
the mammalian TOR protein to Pronase digestion is unchanged in the presence 
of a-KG, whereas, as expected, Pronase resistance in the presence of %-KG is 
increased for ATP5B, which we identified as a new binding target of «-KG. 

f, Increased autophagy in HEK-293 cells treated with octyl o-KG was 
confirmed by western blot analysis of MAP1 LC3 (Novus, NB100-2220), 
consistent with decreased phosphorylation of the autophagy- initiating kinase 
ULKI1 (Fig. 4a). 
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Extended Data Figure 7 | Autophagy is enhanced in C. elegans treated with _Bars indicate the mean. ogdh-1 RNAi worms have significantly higher 
ogdh-1 RNAi. a, Confocal images of GFP::LGG-1 puncta in the epidermis of autophagy levels, and «-KG does not significantly augment autophagy in 
mid-L3 stage, control or ogdh-1 knockdown C. elegans treated with vehicle ogdh-1 RNAi worms (t-test, two-tailed, two-sample unequal variance). 
or a-KG (8 mM). b, Number of GFP::LGG-1 puncta quantified using ImageJ. Photographs were taken at X 100 magnification. 
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Extended Data Table 1 | Enriched proteins in the a-KG DARTS sample 


Protein Symbol Protein Name Score Gomer sample aK sample Enrichment 
Spectra Peptides Spectra Peptides 

ATP5B ATP synthase subunit beta 4088 23 9 121 15 5.3 
HSPD1 60 kDa heat shock protein 2352 31 11 138 29 4.5 
PKM2 Pyruvate kinase isozymes M1/M2 2203 56 i 

LCP1 Plastin-2 1865 14 8 76 13 5.4 
ATP5A1 ATP synthase subunit alpha 1616 4 9 61 12 1.5 
SHMT2 Serine hydroxymethyltransferase 1060 7 5 33 10 4.7 
HSPS0AA1 Heat shock protein HSP 90-alpha 952 29 8 44 8 1.5 
EEF2 Elongation factor 2 943 2 37 9 9.3 
DDX5 Probable ATP-dependent RNA helicase DDX5 652 3 33 10 4.7 
HSPA8 Heat shock cognate 71 kDa protein 615 2 35 10 8.8 


Only showing those proteins with at least 15 spectra in «-KG sample and enriched at least 1.5 fold. 
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Extended Data Table 2 | Summary of lifespan data 


m (mean lifespan, days) n (number of animals) 


Strain % difference P-value 
ehicle a- ehicle a- 
N2 18.9 25.8 36.3 < 0.0001 87 96 
N2 17.5 25.4 45.6 < 0.0001 119 97 
N2 16.3 26.1 60.2 < 0.0001 100 104 
eat-2(ad1116) 22.8 22.9 0.5 0.79 59 40 
daf-16(mu86) 16.3 18.8 15.1 < 0.0001 106 105 
eat-2(ad1116) 21.1 24.0 13.4 0.23 39 59 
daf-2(e 1370) 38.0 47.6 25.1 < 0.0001 72 69 
N2 13.2 22.3 69.8 < 0.0001 100 104 
daf-16(mu86) 13.4 17.4 29.5 < 0.0001 a“ 72 
daf-16 RNAi 14.3 17.6 22.9 < 0.0001 99 99 
N2 16.1 19.1 19.3 0.0003 97 96 
daf-2(e 1370) 38.3 43.9 14.6 < 0.0001 109 101 
aak-2(0k524) 13.7 IZA 24.3 < 0.0001 85 83 
aak-2(0k524) 16.4 17.5 6.7 < 0.0001 97 97 
aak-2 RNAi 16.2 19.9 23.3 < 0.0001 93 92 
N2 15.6 26.3 68.8 < 0.0001 95 102 
N2 15.6 26.3 68.5 < 0.0001 95 102 
egl-9(sa307) 16.2 25.6 58.6 < 0.0001 97 96 
egl-9(sa307) 19.5 27.3 40.3 < 0.0001 95 101 
N2 14.7 21.6 46.9 < 0.0001 100 88 
N2 14.0 20.7 47.9 < 0.0001 112 114 
N2 21.5 24.6 14.6 < 0.0001 101 102 
hif-1(ia4) 20.5 26.0 26.5 < 0.0001 85 71 
hif-1(ia7) 19.6 23.6 20.4 < 0.0001 102 101 
hif-1(ia4) 21.5 24.7 14.7 < 0.0001 88 87 
N2 16.7 23.4 39.7 < 0.0001 104 103 
N2 15.8 22.2 40.5 < 0.0001 104 94 
N2 18.4 24.6 33.4 < 0.0001 99 89 
vhl-1(0k161) 20.0 25.0 24.9 < 0.0001 98 100 
hif-1(ia7) 12.4 17.3 38.9 < 0.0001 97 90 
hif-1(ia7) 17.9 23.7 32.0 < 0.0001 58 55 
N2 16.8 22.4 32.7 < 0.0001 104 101 
N2 15.7 21.6 37.6 < 0.0001 85 99 
smg-1(cc546ts) 18.4 23.8 29.5 < 0.0001 110 87 
smg-1(cc546ts) jpha-4(zu225) 14.2 13.5 -4.9 0.5482 94 109 
smg-1(cc546ts) jpha-4(zu225) 17.6 15.2 -14.0 0.0877 28 34 
N2 13.6 20.7 51.8 < 0.0001 103 104 
smg-1(cc546ts) 16.2 23.0 42.2 < 0.0001 114 121 
smg-1(cc546ts) jpha-4(zu225) 13.8 15.2 10.2 0.254 45 45 
| contro 18.6 23.4 26.1 < 0.0001 94 91 
atp-2 RNAi 22.8 22.5 -1.3 0.3471 97 94 
EV RNAi control 18.8 22.7 20.6 < 0.0001 97 94 
gfp RNAi control 18.5 23.1 25.3 < 0.0001 101 98 
ogdh-1 RNAi 21.2 21.1 -0.7 0.65 98 100 
let-363 RNAi 22.1 23.6 6.8 0.02 94 95 
gfp RNAi control 20.2 Zt 37.4 < 0.0001 99 81 
let-363 RNAi 25.1 25.7 21 0.9511 96 74 
EV RNAi control 22.8 27.2 21.6 <0.0001 70 72 
let-363 RNAi 27.4 27.2 -0.8 0.7239 64 80 
EV RNAi control 19.7 24.3 23.8 < 0.0001 93 84 
atp-2 RNAi 25.3 23.4 -7.4 < 0.0001 87 63 
‘ m ey cis n 5 . 
Strain Vehicle Ollgomycin: % difference P-value —Veliicls — Oligomycin [Oligomycin] 
N2 25.5 25.2 < 0.0001 72 80 pM 
N2 20.4 27.0 32.3 < 0.0001 88 82 40 uM 
N2 23.1 13.2 0.0005 50 20 uM 
N2 22.0 79 0.0106 90 10 uM 
Strain . m % difference P-value . Treatment 
Vehicle Treatment Vehicle Treatment 
N2 . 16.9 16.8 0.0005 71 ictyl a-KG (500 pM) 
N2 14.5 17.0 16.8 < 0.0001 73 60 a-KG 
N2 14.0 18.8 33.9 < 0.0001 112 114 Dimethyl a-KG 
N2 14.0 20.7 47.8 < 0.0001 112 114 a-KG 
N2 15.7 21.6 37.6 < 0.0001 85 99 Disodium a-KG 
Strain = % difference P-value . Food source 
fehicle a- ehicle a- 
N2 17.4 21.2 21.6 0.0001 108 55 Live OP50 
N2 19.0 23.0 21.0 0.0003 88 46 Dead OP50 (y-irradiated) 
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PTEN action in leukaemia dictated by the tissue 


microenvironment 


Cornelius Miething’’+, Claudio Scuoppo’, Benedikt Bosbach’, Iris Appelmann’?, Joy Nakitandwe’, J ing Ma?, Gang Wu’, 
Laura Lintault?*, Martina Auer, Prem K. Premsrirut’, Julie Teruya-Feldstein', James Hicks”, Helene Benveniste®, 


Michael R. Speicher°, James R. Downing? & Scott W. Loweb?+ 


PTEN encodes a lipid phosphatase that is underexpressed in many 
cancers owing to deletions, mutations or gene silencing’*. PTEN 
dephosphorylates phosphatidylinositol (3,4,5)-triphosphate, thereby 
opposing the activity of class I phosphatidylinositol 3-kinases that 
mediate growth- and survival-factor signalling through phosphati- 
dylinositol 3-kinase effectors such as AKT and mTOR’. To determine 
whether continued PTEN inactivation is required to maintain malig- 
nancy, here we generate an RNA interference-based transgenic mouse 
model that allows tetracycline-dependent regulation of PTEN in a 
time- and tissue-specific manner. Postnatal Pten knockdown in the 
haematopoietic compartment produced highly disseminated T-cell 
acute lymphoblastic leukaemia. Notably, reactivation of PTEN mainly 
reduced T-cell leukaemia dissemination but had little effect on tumour 
load in haematopoietic organs. Leukaemia infiltration into the intes- 
tine was dependent on CCR9 G-protein-coupled receptor signalling, 
which was amplified by PTEN loss. Our results suggest that in the 
absence of PTEN, G-protein-coupled receptors may have an unan- 
ticipated role in driving tumour growth and invasion in an unsup- 
portive environment. They further reveal that the role of PTEN loss 
in tumour maintenance is not invariant and can be influenced by 
the tissue microenvironment, thereby producing a form of intratu- 
moral heterogeneity that is independent of cancer genotype. 

Stable RNA interference using short-hairpin RNAs (shRNAs) pro- 
vides a powerful approach for studying tumour suppressor gene activ- 
ity in vitro and in vivo**. To explore the role of PTEN loss in tumour 
maintenance, we developed shRNA transgenic mouse lines targeting Pten 
using miR-30-based shRNAs expressed from an inducible tetracycline- 
responsive element promoter‘ (Fig. 1a and Extended Data Fig. 1). Murine 
embryonic fibroblasts (MEFs) obtained from embryonic day (E)13.5 
embryos of shPten;R26-rtTA2 double-transgenic mice displayed revers- 
ible knockdown of Pten upon doxycycline (Dox) addition and with- 
drawal, which correlated with increased AKT phosphorylation following 
insulin stimulation (Fig. 1b and Extended Data Fig. 1c). As expected”*, 
Dox-treated mice expressing shPten in multiple tissues developed sev- 
eral tumour types including T-cell malignancies (Extended Data Fig. le-i). 

Owing to the high frequency of T-cell disease in the shPten mice and 
the frequent inactivation of PTEN in human T-cell acute lymphoblas- 
tic leukaemia (T-ALL)’, we focused on the effects of PTEN suppression 
and reactivation in the lymphoid compartment. We crossed mice trans- 
genic for an shRNA against luciferase (shLuc) and shPten mice toa Vav- 
tTA transgenic line, which expresses a “Tet-off Tet transactivator in 
early B and T cells'® and drives shRNA expression in a manner that is 
silenced upon Dox addition (Extended Data Fig. 2 and data not shown). 
The Vav-tTA;shPten mice displayed thymic hyperplasia (Extended Data 
Fig. 2a—d) and, by 16 weeks, a subset deteriorated and had to be eutha- 
nized (Fig. 1c), whereas control animals remained healthy (P < 0.001). 
Diseased mice showed massive enhanced green fluorescent protein 


(eGFP)-positive tumours that consisted of Thy1 .2* CD4* CD8* double- 
positive T cells filling the thoracic cavity and infiltrating spleen, lymph 
nodes as well as extrahematopoietic organs like the liver, kidney and in- 
testine (Fig. 1d, Extended Data Figs 2e, f and 3a, and data not shown). 
shPten-expressing tumours demonstrated marked Pten knockdown and 
increased AKT phosphorylation comparable to Pten-null T-cell malig- 
nancies (Fig. le, see ref. 11). 


a d 
4H TRE} eGFP || shPten HH PGK) ATG* fo fe 
SApA pA y 4 A 
10 fas 10° 
S fo BS il 
ae ix: ee 210° EB 4 S10? ‘cece wall 
= 5: 
x / 
Me / | 407 | = | 10° 
0 = 0: 
Colta1 locus * / er = 7 pe er 
—fHH HH Hr ek rece Lfvromyen eh ae aay a 
FRT FRT ° 
NYA & @ 
b e & Ss Sd ots 
Insulin (time): Omin Smin 30min 60min 12h 24h F & eos Kok 
NN re 
Doxs - + - +- + -+- +- + Se Oe of & rae & 
pAKT(T308) . — es g s SS 
PTEN- = PTEN 
pAKT(S473) — on a en awa a A ee ee PAKTITI08) 


AKT | cm eee em aes cee es ee ee em ee ee ee ee re wee PAKT(S473) 
' 


SS ee ee op oe os “KT 


eGFP qe @2 @ @ @ @ 


es es we cNOTCHI 
AQTE emer ee 


c 100 

7554 : 
— Single transgene 
— shLuc +tTA 


254 —shPten +tTA 
—— shPten +tTA (+Dox) 


Sap a oe om oe ae ACTE 


PTEN status 
High Low 


215 5 


PTEN high =} 


Survival (%) 
a 
Oo 
i 


P<0.001 


Disseminated disease 


PTEN low 


0 50 100 150 200 250 300 350 400 
Time (days) 
Figure 1 | Pten shRNA transgenic mice develop disseminated CD4* CD8* 
double-positive T-cell leukaemia. a, Outline of the targeting construct and the 
embryonic stem (ES) cell-targeting strategy. FRT, FLP recognition target; pA, 
polyadenylation site; PGK, phosphoglycerate kinase promoter; SA, splice 
acceptor site; TRE, tetracycline-responsive element promoter. ATG* denotes 
truncated ATG sequence; *Hygromycin denotes ATG-less hygromycin 
resistance gene. b, Immunoblot (western blot) analysis of MEFs from 
shPten;R26-rtTA2 transgenic mice + Dox for 5 days at indicated time points 
after stimulation with 100 nM insulin. c, Overall survival of Vav-tTA;shPten 
mice (n = 49) and controls (n = 98, P< 0.001 by log-rank). d, Flow cytometric 
analysis of a representative primary Vav-tTA;shPten tumour for eGFP, Thy1.2, 
CD4 and CD8 (n = 10). e, Western blot analysis of T-cell tumours from 
Trp53 |, Pten!".Lck-cre and Vav-tTA;shPten mice for the indicated proteins. 
f, PTEN IHC of bone marrow samples of 31 human patients with T-ALL 
categorized as PTEN high (top left) or low/negative (bottom left). Association 
of PTEN expression with status for disseminated disease was calculated 
using a contingency table (Fisher’s exact test). 
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Human T-ALLs with PTEN loss often overexpress MYC and can har- 
bour NOTCH1 and CDKN2A mutations”. Analysis of murine shPten- 
expressing tumours by spectral karyotyping, comparative genomic 
hybridization (CGH) and sequencing of the gene encoding the T-cell 
receptor B-chain showed that most primary tumours were clonal and 
harboured the same recurrent translocations between the Tcra locus and 
Myc observed in a Pten knockout model and a small subset of human 
T-ALL (Extended Data Figs 3b, c and 4a, and data not shown)’*"*. One 
shPten T-ALL showed a Cdkn2a deletion by CGH and six out of eight 
tumours analysed showed activating mutations in the Notch1 PEST do- 
main (Fig. le and Extended Data Figs 3c, d and 4b). Gene set enrich- 
ment analysis (GSEA) of gene expression profiles obtained from shPten 
leukaemia demonstrated enrichment for a human PTEN-mutated T-ALL 
signature, and profiles from human PTEN-mutated T-ALLs were enriched 
for a murine shPten signature (Extended Data Fig. 5a, b). Thus, although 
all the T-cell leukaemias were initiated by a Pten shRNA, they acquire 
molecular features reminiscent of the human disease'*"*"°. 

The leukaemia arising in shPten mice was highly malignant, and rap- 
idly produced disease when transplanted into recipient mice (Extended 
Data Fig. 6a). Of note, because the Vav-tTA;shPten transgenic mice were 
of a mixed genetic background, Rag] ‘~ recipients were used to avoid 
graft rejection. These recipients succumbed to a highly disseminated 
form of T-ALL consisting of CD4* CD8™ double-positive cells that 
rapidly took over the haematopoietic organs, accumulated to high levels 
in the peripheral blood, and spread to the liver, kidney and intestine 
(Fig. 2d and Extended Data Fig. 6b). Notably, decreased PTEN levels 
were associated with disease dissemination and lower survival in T-ALL 
patients (Fig. 1f and Extended Data Fig. 6c), and were also linked with 
intestinal infiltration in patients with peripheral T-cell lymphoma (Ex- 
tended Data Fig. 6d, e). The association between PTEN loss and disease 
dissemination in murine and human T-cell malignancies underscores 
the relevance of the model to human disease. 

Wereasoned that the transplanted leukaemias described above would 
be ideal for our experiments as they are highly malignant such that indi- 
vidual primary isolates can be studied for their response to different 
perturbations in multiple secondary recipients. Recipients were mon- 
itored for disease development by weekly analysis of peripheral blood 
for the presence of eGFP” (shPten-expressing) cells. Upon disease mani- 
festation, a cohort of mice was given Dox to silence the shRNA and 
reactivate PTEN. Notably, Dox treatment almost tripled the survival 
time of mice harbouring Vav-tTA;shPten leukaemia (Fig. 2a; P< 0.0001) 
but had no effect on mice harbouring Pten ’~ leukaemia (Extended 
Data Fig. 6f). Immunoblotting of leukaemic cells collected from mice 
indicated that the system worked as expected: hence, Dox addition led 
to upregulation of Pten messenger RNA (Extended Data Fig. 7a-c and 
data not shown), silenced eGFP and re-established PTEN to endogen- 
ous levels (Fig. 2b and Extended Data Fig. 7d). Therefore, PTEN react- 
ivation had a marked anticancer effect but was by no means curative. 

Leukaemia-bearing mice showed magnetic resonance imaging (MRI) 
signals in multiple haematopoietic compartments, the liver and intest- 
ine (Extended Data Fig. 6h, i and data not shown). Although PTEN 
reactivation had no overt effect on tumour growth in the lymph nodes 
or spleen, it visibly decreased tumour infiltration into intestine and liver 
(Fig. 2c and Extended Data Fig. 6g-i). These findings were corrobo- 
rated by immunohistochemistry (IHC) and flow cytometric quantifica- 
tion of CD4* leukaemic cells (Fig. 2e, f). Notably, Dox treatment had 
a minimal impact on the proliferation or apoptosis of leukaemic cells 
residing in the lymph nodes and spleen, but triggered apoptosis in 
leukaemic cells that had disseminated into the intestine (Fig. 2g and 
Extended Data Fig. 7e-g). Thus, the impact of PTEN expression on 
disease progression is dictated by the anatomical location of the leuk- 
aemic cell. 

We next assessed the phosphorylation state of key phosphatidylinositol 
3-kinase (PI3K) effectors in tissue sections by IHC and pathway function- 
ality by positron emission tomography (PET) of '*F-fluorodeoxyglucose 
(FDG) uptake into leukaemia cells’®. The heterogeneous responses 
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Figure 2 | The impact of PTEN reactivation on leukaemia viability is 
influenced by anatomical site. a, Overall survival of Rag1 ~~ mice 
transplanted with 1 x 10° cells from Vav-tTA;shPten (shPten) tumours and 
treated with Dox (Pten on; n = 14), or untreated controls (Pten off; n = 15), 
P< 0.0001 by log-rank test. b, Western blot analysis of splenic tumour cells 
from control, untreated, and mice treated with Dox for 5 days. c, Brightfield and 
eGFP images of lymph nodes and spleen from an untreated mouse (Pten off) 
and mouse treated with Dox for 5 days (Pten on) (n = 10). d, Flow cytometric 
analysis of CD4, eGFP and CD8 expression in tumour cells from the peripheral 
blood of mice + Dox for 5 days (n = 10). e, IHC analysis for CD3 expression 
in the spleen, liver and small intestine from shPten T-ALL transplanted 

mice + Dox for 5 days (n = 3 per group). Scale bars, 100 jum, 20 um in insets. 
f, Relative tumour infiltration in the indicated organs of transplanted Rag] /~ 
mice off (n = 7) and on (n = 7) Dox, quantified by flow cytometric analysis 
of CD4* cells; *P < 0.05, **P< 0.01 by Student’s t-test (+ s.d.). g, IHC 
staining for cleaved caspase 3 (CC3) in the spleen and intestine from mice 

10 days after transplant with shPten T-ALL either left untreated or treated 
with Dox for 36h. Representative sections from one of three mice per cohort 
are shown. Scale bars, 100 tum, 20 tm in insets. 


correlated with the ability of PTEN to effectively suppress aberrant PI3K 
signalling: whereas S6 and AKT phosphorylation were reduced in dis- 
seminated leukaemic cells obtained from the intestine, it persisted in 
the leukaemic cells collected from the spleen of the same animal (Fig. 3a 
and Extended Data Fig. 8). Similarly, mice displayed a marked reduc- 
tion in FDG signal stemming from the liver and intestine within 2 days 
of PTEN reactivation, an effect that could not simply be accounted for 
by loss of leukaemia burden (Fig. 3b, c). Conversely, the FDG signal ema- 
nating from the spleen and bone marrow remained strong (Fig. 3b, d 
and Supplementary Videos 1 and 2). The divergent responses to PTEN 
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activation in a clonal leukaemia suggest that the control of the PI3K 
pathway can be markedly affected by microenvironmental factors. 

Surprisingly, untreated NCr nude mice transplanted with the same 
number of shPten tumour cells survived as long as Rag] ~’~ recipient 
mice treated with Dox, and did not show a survival advantage follow- 
ing Dox addition (Fig. 4a). The untreated NCr recipients displayed vastly 
reduced intestinal dissemination of leukaemic cells compared to normal 
and thymectomized Rag] ‘~ recipients (Fig. 4b and Extended Data 
Fig. 9a, b, g, h), whereas spleen and lymph nodes were strongly affected 
(Fig. 4c and Extended Data Fig. 9c, d). Apparently, genetic differences 
between Rag] ’~ and NCr mice contribute to variation in disease ag- 
gressiveness and the response to PTEN reactivation. 

Whereas Rag] ‘~ mice are defective in immunoglobulin and T-cell 
receptor gene rearrangement, NCr mice have mutations in Foxn1,a gene 
that controls terminal differentiation of epithelial cells in the thymus 
and other organs’’. Among other changes, NCr mice show decreased 
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Figure 3 | Tissue-dependent effects of PTEN reactivation on PI3K 
signalling. a, Small intestinal sections from shPten T-ALL transplanted 
mice + Dox were stained with haematoxylin and eosin (H&E) or by IHC for 
the indicated proteins. Representative sections from one of three mice per 
cohort are shown. Scale bars, 100 um, 20 jm in insets. b, Serial 18h_EDG PET 
analysis of shPten T-ALL transplanted mice before and 2 days after beginning 
of Dox treatment. White arrows, bone marrow; red arrow, spleen; green arrow, 
liver/intestine. Representative images from two out of 12 analysed mice are 
shown. c, CD3 IHC staining of shPten tumour infiltrations in the liver of 
mice + Dox 4 days after treatment initiation (n = 3 per group). Arrows 
highlight CD3* tumour infiltrates. Scale bars, 500 um. d, 18h_EDG PET/CT 
analysis of shPten T-ALL transplanted mice + Dox 4 days after beginning 

of Dox treatment. Full arrow, spleen; dashed arrow, liver/intestine. 
Representative images from two out of six analysed mice are shown. 
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Figure 4 | CCL25-CCR9 chemokine signalling contributes to leukaemia 
dissemination. a, Overall survival of Rag ~~ mice (— Dox, n = 9; + Dox, 
n= 7) and NCr mice (— Dox, n = 7; + Dox, n = 7) transplanted with 1 x 10° 
shPten leukaemia cells. Survival of Rag1 ~’~ (= Dox) versus NCr (— Dox) mice; 
P<0.0001 by log-rank test. b, IHC staining for eGFP in intestinal sections 
from Ragl ’~ and NCr mice transplanted with shPten leukaemia (n = 3 per 
cohort). Scale bars, 100 1m. Numbers show mean fraction ( + s.d.) of 
infiltrating eGFP* tumour cells of total viable cells as determined by flow 
cytometry (P < 0.03, Student’s t-test). c, Representative images of lymph nodes 
and spleens from transplanted Rag! ‘~ and NCr mice (n = 7). d, CCR9 
receptor expression on shPten leukaemia cells, normal CD4* CD8* double- 
positive and CD4 or CD8 single-positive thymic T cells measured by flow 
cytometry (n = 3 per group). e, Western blot analysis of indicated proteins in 
shPten leukaemic cells + Dox after stimulation with 500 ng ml” ' CCL25 for the 
indicated time. f, Western blot of indicated proteins in human T-ALL cells 
infected with either shRenilla control or shPTEN.1522 and + stimulation with 
CCL25 for 15 min. pS6 and pAKT in e and f denote pS6(S235/236) and 
pAKT(S473). g, Outline of competition experiment of untransduced versus 
shCcr9-mCherry- or control shRenilla-mCherry-transduced eGFP™ shPten 
tumour cells. h, Normalized ratio of mCherry*/eGFP* cells over all eGEP* 
cells isolated from spleen, liver and intestine of five mice per cohort from 
two independent transplantations (+ s.d.). Cells were analysed by flow 
cytometry and normalized to the mCherry/eGFP ratio in the spleen to 
account for differences in transduction rate. 
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expression of Ccl25 (refs 18, 19), which encodes a chemokine that is 
mainly expressed by epithelial cells in the thymus and small intestine 
and acts as an important chemoattractant for T cells in the gut”®”". 
CCL25 acts through CCR9, a G-protein-coupled receptor that can signal 
through the PI3K pathway and is expressed on a subset of developing 
thymocytes”*”’. Signalling through a related receptor, CCR7, is import- 
ant for leukaemia dissemination into the central nervous system”; more- 
over, the CCL25/CCR9 network is required for T-cell dissemination 
during inflammatory bowel disease, which can be countered by CCR9 
antagonists currently in clinical trials***°. CCL25 levels were decreased 
in the intestine of NCr mice (Extended Data Fig. 9e, f), whereas CCR9 
was highly expressed on the shPten leukaemia cells (Fig. 4d). Notably, 
CCR9 expression was not affected by PTEN reactivation as determined 
by fluorescence-activated cell sorting and RNA-seq analysis (Extended 
Data Fig. 9i and data not shown). 

To test whether PTEN influences T-ALL homing and survival in the 
intestine by modulating CCL25 signalling, shPten T-cell leukaemia iso- 
lates were treated with CCL25 (+ Dox to modulate PTEN), and cell 
signalling and motility was assessed in short-term culture. Whereas 
CCL25 stimulation had little impact on PI3K signalling in the presence 
of PTEN, Pten knockdown sensitized cells to CCL25-induced AKT phos- 
phorylation and, toa lesser extent, S6 phosphorylation (Fig. 4e). Similar 
results were obtained with two human T-ALL lines transduced with 
either shPTEN or a control shRNA (Fig. 4f and Extended Data Fig. 9j). 
CCL25 addition also increased migration of murine shPten T-ALL cells 
in a transwell assay, and the effect was largely abrogated by PTEN re- 
activation (Extended Data Fig. 9k). 

Dual-colour in vivo competition experiments were performed to assess 
the contribution of CCR9 signalling to T-ALL dissemination (Fig. 4g). 
After identifying shRNAs efficient at knocking down Cer9 (Extended Data 
Fig. 10a, b), eGFP* shPten leukaemic cells were transduced with either 
shCcr9 or shRenilla control shRNAs co-expressing the mCherry red fluo- 
rescent protein (Fig. 4g and Extended Data Fig. 10d). Upon transplanta- 
tion and subsequent disease development, mice were euthanized and the 
fraction of eGFP/mCherry" cells versus all eGFP~ cells was determined 
in various organs (Fig. 4g, h). shCcr9-expressing T-ALL cells showed sig- 
nificantly decreased abundance in the intestine but not the spleen or liver 
(Fig. 4h and Extended Data Fig. 10c, d). Mice transplanted with shPten 
leukaemic cells were also treated with a small molecule inhibitor for 
CCR3 that is in clinical trials for the treatment of inflammatory bowel 
disease”®. Although the effects on survival were modest, leukaemia dis- 
semination was reduced in the intestine, whereas cells in the spleen and 
liver were unaffected (Extended Data Fig. 10e-h and data not shown). 
Hence, in the intestine, PTEN suppression promotes leukaemic cell dis- 
semination and maintenance by modulating CCL25—CCR9 signalling. 

In human cancers, PTEN deletions often coincide with tumour ex- 
pansion, metastasis and a generally worse prognosis’”’”*, results con- 
firmed and extended for T-cell disease in this report. Using a powerful 
new mouse model enabling reversible suppression of endogenous PTEN 
expression, we show that PTEN loss can promote tumour cell survival at 
distant sites by amplifying weak environmental cues that enable tumour 
cells to survive in an otherwise non-supportive microenvironment. Ac- 
cordingly, the promiscuous yet passive ability of PTEN to attenuate PI3K 
signalling” may be influenced by the nature and intensity of phospha- 
tidylinositol (3,4,5)-triphosphate-generating signals in different micro- 
environments, and targeting such tissue-specific signals might present 
a valid strategy to treat cancer spread. Still, the requirement for PTEN 
loss in tumour maintenance is not absolute and can be influenced by 
genetic context?” and, as shown here, the tumour microenvironment. 
These observations paint a more complex picture of how PTEN inac- 
tivation drives tumour maintenance, and reveal an interplay between 
tumour and microenvironment that would not be predicted from studies 
on cultured cells. Nonetheless, this interplay produces a form of intra- 
tumoral heterogeneity that is independent of genotype but can affect 
disease progression and perhaps the clinical response to molecularly 
targeted therapies. 
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ES cell targeting and generation of Pten shRNA transgenic mice. KH2 ES cells 
expressing the reverse transactivator (rtTA2) from the Rosa26 promoter were 
electroporated with Pten shRNAs cloned into a recombination-mediated cassette 
exchange vector (cTGM) targeting the Colla1 locus’, and correctly targeted and 
functional ES cell clones were identified and used to generate live mice by tetraploid 
embryo complementation. The number of each shRNA is used as index and refers 
to the position of the first nucleotide of the shRNA guide strand relative to the refseq 
cDNA sequence. MEFs were generated from 13.5-day-old embryos according to 
standard protocols. Mice were bred to CMV-rtTA, CAGGS-rtTA and Vav-tTA trans- 
activator lines to generate heterozygous (for example, shPten*!— 3;CM V-rtTA*! ~) 
double-transgenic mice using standard breeding techniques. To induce shRNA 
expression, Pten and firefly luciferase (Luc) or Renilla luciferase (Renilla) shRNA 
mice bred to CMV-rtTA or CAGGS-rtTA mice were put on food containing 
625 mgkg ' Dox (Harlan Teklad) immediately after weaning. Dox food was also 
used to shut off shRNA expression in Vav-tTA transgenic animals at different time 
points. All mouse experiments were performed in accordance with institutional and 
national guidelines and regulations and were approved by the Institution Animal 
Care and Use Committee (IACUC no. 06-02-97-17 (Cold Spring Harbour Laboratory) 
and no. 11-06-017 (Memorial Sloan Kettering Cancer Center)). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Constructs and shRNAs. To identify potent shRNAs targeting murine and human 
Pten, various 97 -bp oligonucleotides predicted from sensor-based and other shRNA 
design algorithms (Extended Data Fig. 4c and data not shown)*"*? were XhoI-EcoRI 
cloned into the miR-30 cassette of the MLP vector and tested as described prev- 
iously (Extended Data Fig. 1a)**. The two most efficient murine Pten shRNAs 
(Pten.1522 and Pten.2049, numbers refer to the position of the first nucleotide of 
the shRNA guide strand relative to the refseq cDNA sequence) were cloned into a 
recombination-mediated cassette exchange (RCME) vector (cTGM) targeting the 
Colla locus (see Fig. 1a)***. For knockdown of human PTEN, the Pten.1522 shRNA, 
which showed complete overlap with the human PTEN sequence, was used. For 
knockdown of murine Ccr9, multiple shRNAs were designed, cloned and tested as 
described above. The two most efficient shRNAs, shCcr9.904 (97-mer: 5'-TGCT 
GTTGACAGTGAGCGCAAGGATAAGAATGCCAAGCTATAGTGAAGCCA 
CAGATGTATAGCTTGGCATTCTTATCCTTATGCCTACTGCCTCGGA-3’) 
and shCcr9.2357 (97-mer: 5'-TGCTGTTGACAGTGAGCGCCCCAACAGTTT 
ACAACCTTTATAGTGAAGCCA CAGATGTATAAAGGTTGTAAACTGTTG 
GGATGCCTACTGCCTCGGA-3’), were cloned into a LMN-cherry vector (MSCV- 
miR30-pgk-NeoR-IRES-mCherry) for dual-colour competition assays (see below). 
ES cell targeting and generation of transgenic mice. Two potent shRNAs against 
murine Pten were cloned into a cassette that links eGFP and shRNA expression 
downstream of TRE, and targeted into a defined locus downstream of the collagen, 
type I, alpha 1 (Col1a1) gene in KH2 ES cells expressing the reverse transactivator 
(rtT.A2) from the Rosa26 promoter** by RMCE (Fig. 1a and Extended Data Fig. 1a)*°*. 
Southern blotting showed correct transgene insertion, and Dox-inducible knock- 
down of endogenous PTEN was confirmed by western blot analysis (Extended 
Data Fig. 1b). 

Germline transgenic mice were generated by tetraploid embryo complementa- 
tion. MEFs were generated from 13.5-day-old embryos according to standard pro- 
tocols. Since both shRNAs caused a similar degree of Pten knockdown and PI3K 
pathway activation and equally promoted tumorigenesis in in vivo transplantation 
experiments (Extended Data Fig. 1c, d and data not shown), we focused subsequent 
analysis on a single (shPten.1522) transgenic line. 

Mice were bred to CMV-rtTA, CAGGS-rtTA and Vav-tTA transactivator lines®'°*° 

to generate compound heterozygous or homozygous (for example, shPten*'~ ;CMV- 
rtTA*'” orshPten*'*;Vav-tTA*'*) double-transgenic mice using standard breed- 
ing techniques. To induce shRNA expression, Pten and firefly luciferase (Luc) or 
Renilla luciferase (Renilla) shRNA mice bred to CMV-rtTA or CAGGS-rtTA mice 
were put on food containing 625 mg kg” ' Dox (Harlan Teklad) immediately after 
weaning. As predicted from knockout mice”**, most double-transgenic mice har- 
bouring the inducible shPten allele together with the broadly expressing CMV-rtTA 
or the CAGGS-rtTA3 transactivator strains®*’ developed tumours within 12 months 
of Dox addition (Extended Data Fig. 2; data not shown). Dox food was also used to 
shut off shRNA expression in Vav-tTA transgenic animals at different time points. 
For the Vav-tTA;shPten mice survival studies, a number of Vav-tTA* ;shPten* mice 
(n = 49) and controls (Vav-tTA* ;shLuc* (n = 20), Vav-tTA*;shPten” (n= 68), 
Vav-tTA* ;shPten* , + Dox (n = 10)) were generated and analysed. No difference 
in phenotype was observed between heterozygous (shPten*'~ ;Vav-tTA*’~) and 
homozygous (shPten*'*;Vav-tTA *'*) mice. All mouse experiments were performed 
in accordance with institutional and national guidelines and regulations and were 
approved by the Institution Animal Care and Use Committee (IACUC no. 06-02- 
97-17 (Cold Spring Harbour Laboratory) and no. 11-06-017 (Memorial Sloan 
Kettering Cancer Center)). 
Statistics and reagents. For all murine survival studies, a group size of at least five 
animals per condition was chosen, which allowed the detection of twofold differ- 
ences in survival with a power of 0.89, assuming a two-sided test with a significance 
threshold & of 0.05 and a standard deviation of less than 50% of the mean. For the 
primary animals, all mice with the correct genotype were included in the analysis. 
For the transplantation experiments, all mice receiving similar amounts of trans- 
planted cells as determined by flow cytometric and/or whole-body immunofluor- 
escent evaluation 5-10 days after transplant were included in the analysis. For the 
reactivation and treatment experiments, animals were assigned into different groups 
by random picking from the non-selected transplanted group of mice 5-10 days 
after transplant. Blinding of animals in the reactivation/inhibitor treatment stud- 
ies was not feasible, because of requirements by the local animal housing facility to 
mark cages if containing special food/treatment. 

Appropriate statistical tests were applied as indicated, including non-parametric 
tests for experiments where sample size was too small to assess normal distribution. 
All t-tests and derivatives were two-sided. For all tests, variation was calculated as 
standard deviation and included in the graphs as error bars. To investigate whether 
PTEN expression has an impact on tumour dissemination in human T-ALL, we per- 
formed IHC staining for PTEN on bone marrow sections from 31 patients with newly 
diagnosed T-ALL, for which clinical data on disease dissemination was available. 
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Owing to the relatively low number of patient specimens and because the variables 
were nonlinear, we analysed the data in a contingency table using Fisher’s exact test. 
Wealso re-analysed the contingency table using Berger’s test, with similar results”. 
For probing an association between PTEN expression status and intestinal infil- 
tration in human T-cell lymphoma patients, the same statistical tests were applied. 
All antibodies used for western blot analysis were purchased from Cell Signaling 
Technology unless otherwise specified, including the antibodies against PTEN (cat 
no. 9188), pAKT(S473, cat no. 4060), pAKT(T308, cat no. 2965), AKT (cat no. 4691), 
S6 (cat no.2317), pS6(S235/236, cat no. 4858), cleaved Notch] (cat no. 4147). For 
intracellular flow cytometric analysis of pS6(S235/236) directly, Pacific-Blue 
fluorescence-coupled antibodies were purchased from Cell Signaling (cat no. 8520). 
Antibodies for flow cytometry were purchased from BioLegend unless other- 
wise specified. Mouse antibodies included CD3 (clone 145-2C11), CD4 (clone GK1.5), 
CD8 (clone 53-6.7), Thy1.2 (clone 30-H12), CD45 (clone 30-F11), CCR9 (clone 9B1), 
CD11b (clone M1-70), Gr-1 (clone RB6-8C5), CD44 (clone IM7), CD25 (clone PC61). 
Human T-ALL cell lines HBP-ALL and TALL] were a kind gift from I. Aifantis. 
All cell lines were tested for absence of mycoplasma and authenticated by flow cytom- 
etry and western blotting. 
Transplantation experiments. For transplantation, single-cell suspensions were 
generated from primary tumours and 1 X 10° cells were injected into sublethally 
(450 rad) irradiated recipient female Rag1 ~/~ (ona C57B6 background, cat no. 2216) 
or NCr nude mice (on an inbred albino background, cat no. 2019) via tail vein 
injection. All mice used as transplant recipients were purchased from Jackson Lab- 
oratory. Mice were monitored by serial flow cytometric analysis of the peripheral 
blood. Once eGFP* cells reached >5% of total leukocytes, cohorts of mice were 
started on Dox containing food as indicated. 
Analysis of human T-ALL and PTCL patient samples. For the analysis of sur- 
vival of PTEN normal versus PTEN altered patients with T-ALL, published geno- 
micand mRNA expression data on patients with T-ALL was used (accession number 
GSE28703)’°. PTEN altered (n = 20) included patients with PTEN deletion, muta- 
tion, underexpression (<0.8 sigma after z scoring) and any combination of such 
alterations, and PTEN normal (n = 62) included all other patients with available 
data. For IHC analysis, samples from patients with T-ALL were analysed as indi- 
vidual bone marrow biopsies. For the T-cell lymphoma samples, tissue microar- 
rays were constructed as previously published’ using a fully automated Beecher 
Instrument, ATA-27. The study cohort comprised 84 patients with T-cell lympho- 
mas and 31 patients with T-ALL. The T-cell lymphomas could be subdivided into 
enteropathy-associated T-cell lymphoma (4) and peripheral T-cell lymphoma in- 
volving bowel or gastrointestinal tract (7), T/NK cell lymphoma (4), angioimmu- 
noblastic T-cell lymphoma (9), anaplastic large cell lymphoma (14), and peripheral 
T-cell lymphoma with non bowel or gastrointestinal tract involvement (46). All 
samples were consecutively ascertained at the Memorial Sloan-Kettering Cancer 
Center (MSKCC) between 2001 and 2012. Use of tissue samples were approved 
with an Institutional Review Board Waiver and the Human Bio-specimen Utilization 
Committee. All biopsies were evaluated at MSKCC, and the histological diagnosis 
was based on H&E staining. The PTEN antibody (rabbit monoclonal antibody from 
Cell Signaling, 138G6, no. 9559) was used at a 1:30 dilution. IHC analysis was per- 
formed on the Ventana Discovery XT automated platform according to the man- 
ufacturer’s instructions. Results were scored as 0, 1,2 for PTEN with 0 = no staining 
of tumour cells, with endothelial cell- and macrophage-positive; 1 = weak staining 
of tumour cells, compared to endothelial cell- and macrophage-positive; 2 = strong 
staining of tumour cells, compared to endothelial cell- and macrophage-positive. 
Representative images were taken using the Olympus BX41 model, DP20 camera, 
at X60 objective. 
MRI. The mice were anesthetized with 2-3% isoflurane delivered in O, and allowed 
to breathe spontaneously during the imaging study. The mice were positioned 
supine in a custom-made acrylic cradle fitted with a snout mask for continuous deliv- 
ery of anaesthesia. Non-invasive, MRI-compatible monitors (pulse-oximetry, res- 
piratory rate and rectal temperature probe, SA Instruments) were positioned for 
continuous monitoring of vital signs while the animal underwent MRI imaging. 
During imaging, body temperature was kept strictly within 36.5-37.5 °C using a 
computer-assisted air heating system (SA Instruments). All imaging was performed 
on a 9.4T/20 MRI instrument interfaced to a Bruker Advance console and con- 
trolled by Paravision 5.0 software (Bruker Biospin). A volume radiofrequency coil 
(diameter 11.2 mm) used in transmit and receive mode were used for all imaging 
acquisitions. Following localizer anatomical scout scans, a 2D multi-slice T2-weighted 
RARE sequence along the coronal slice direction with fat suppression was obtained 
with the following parameters: TR = 12,500 ms, TE = 40 ms, RARE factor = 8, 
NA = 8, FOV = 6.0 X 6.0. cm? (256 X 256), yielding an in-plane resolution of 0.234 
0.234 mm”, slice thickness = 0.9 mm total scanning time = 10 min 40s. 
'SE-EDG-PET analysis. For PET analysis, mice were fasted for 6 h before intra- 
venous injection of 0.5 mCi 'SF-FDG. Mice were kept for 1h under isoflurane 
anaesthesia and subsequently imaged on a Focus 120 microPET (Siemens) or in 
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some cases an Inveon MicroPET/CT (Siemens). Image normalization and analysis 
was performed using the ASI Pro MicroPET analysis software and the Inveon Work- 
place software package (Siemens Medical Solutions). 

CGH. CGH experiments were performed using standard Agilent 244k mouse 
whole genome arrays, and hybridizations were carried out according to the man- 
ufacturer’s recommendations. Data processing, normalization and segmentation 
were carried out as described”. 

Multiplex-FISH (M-FISH)/spectral karyotyping analysis. Cells were cultured in 
RPMI-1640 with L-glutamine (PAA) supplemented with 10% FBS, 1% penicillin- 
streptomycin, 50 1M mercaptoethanole, 50 U ml~' human interleukin 2 and5 pg ml~* 
concanavalin A for 48-72 h at 37 °C and 5% CO. To prepare metaphases, colce- 
mid ata final concentration of 0.1 pg ml ' was added to the cells for 120 min. Spin- 
ning at 300g for 8 min was followed by hypotonic treatment in pre-warmed 0.075 M 
KCl for 20 min at 37 °C. Cells were fixed in cold ethanol/acetic acid (3:1) and air- 
dried slides were prepared. 

The M-FISH hybridization was performed with a panel of mouse M-FISH probes 
(21 X Mouse mFISH probe kit, MetaSystems) according to the manufacturer’s in- 
structions. In brief, the probes were denatured at 75 °C for 5 min. and pre-annealed 
at 37 °C for 30 min. Slides were incubated in 0.1 X SSC for 1 min, denatured in 
0.07N NaOH at room temperature for 1 min, quenched in 0.1 X SSC at 4°C and 
2 X SSC at 4°C for 1 min each, dehydrated in an ethanol series and air dried. M- 
FISH probe was applied onto the slides and hybridization was performed for 48 h 
in a humidified chamber at 37 °C. Following hybridization, the slides were washed 
in 0.4 X SSC at 72 °C for 2 min, followed by a wash in 2 X SSC, 0.05% Tween20 at 
room temperature for 2 min. Counterstaining was performed using DAPI (4’,6- 
diamidino-2-phenylindole) and mounted with phenylenediamine. 

Slides were visualized using a Leica DMRXA-RF8 epifluorescence microscope 

equipped with special filter blocks (Chroma Technology). For image acquisition, a 
Sensys CCD camera (Photometrics) with a Kodak KAF 1400 chip was used. Both 
the camera and microscope were controlled with Leica Q-FISH software (Leica Micro- 
systems Imaging Solutions). Images were analysed using the Leica MCK-Software 
package (Leica Microsystems Imaging Solutions). 
Western blotting, flow cytometry and antibodies. Western blotting was performed 
according to standard protocols. In brief, tissues were either snap frozen in liquid 
nitrogen and homogenized, or dissociated into single cells using 100 jm nylon mesh 
(CellStrainer, BD Falcon). Protein was extracted using standard protein lysis buffer 
(20 mM Tris (pH 7.5), 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton, 2.5 mM 
sodium pyrophosphate, 1 mM f-glycerophosphate, 1 mM Na3VO,) supplemented 
with a protease inhibitor cocktail (Complete Roche Diagnostics) and quantified using 
a Bradford Protein Assay (Bio-Rad). Proteins were separated on a polyacrylamide 
gel (ProteanlII, Bio-Rad) and transferred to a PVDF membrane (Immobilon-F, 
Millipore). Protein bands were resolved using fluorochrome-conjugated second- 
ary antibodies on an Odyssey scanner (Licor). All western blot experiments were 
replicated at least twice. 

For the analysis of pAkt and pS6 induction after CCL25 stimulation, shPten T-ALL 
cells derived from spleen tissues of tumour-bearing shPten mice and adapted to cell 
culture were either left untreated or treated with Dox for 4 days to reactivate PTEN, 
starved over night at 0.5% FBS and then treated with CCL25 for the indicated time 
points. For the CCL25 stimulation assay of the human T-ALL cell lines HBP-ALL 
and TALLI, the cells were infected with a retroviral vector coexpressing a shPTEN. 
1522 or Renilla control shRNA and a puromycin-resistance cassette linked to GFP. 
After selection with puromycin for a 5 days, the cells were starved over night at 
0.1% FBS and then treated with CCL25 for 15 min before collecting for immuno- 
blot analysis. 

For flow cytometric analysis (FACS), single-cell suspensions were brought to a 

concentration of 1X 10° cells per ml in FACS-buffer (PBS, pH 7.4, 1% BSA) and 
stained with the indicated antibodies as per the manufacturer’s protocol. After wash- 
ing, cells were measured on a Guava easyCyte (Millipore) or LSRII (BectonDickinson) 
FACS machine. Cell sorting was performed on a BectonDickinson FACSAria II 
machine. For intracellular pS6 measurement, cells were fixed with 2% parafor- 
maldehyde and permeabilized with methanol before staining for pS6 as previously 
described’. 
CCL25 expression. For quantification of CCL25 chemokine levels in the intestine 
of Ragl ~~ and NCr nude mice, parts of the jejunum and ileum were dissected 
from euthanized animals, cleaned in phosphate-buffered saline (pH 7.4), and snap 
frozen in liquid nitrogen. Tissues were homogenized in T-PER Tissue Protein Extrac- 
tion Buffer (Pierce Biotechnology) supplemented with a protease inhibitor cocktail 
(Complete Roche Diagnostics) to prevent degradation of proteins during and after 
homogenization. Protein extracts were centrifuged at 20,000 r.c.f., supernatants 
were collected, and total protein content was assayed using the Bradford Protein 
Assay (Bio-Rad). Tissue homogenates were analysed for CCL25 protein levels by a 
3-step sandwich enzyme-linked immunosorbent assay as per the manufacturer’s 
instructions (R&D Systems). 


Transwell migration assay. For transwell migration assays, 5 X 10° cells with or 
without Dox treatment were starved in RPMI containing 1% serum for 12h over- 
night and the next day 3 X 10° viable cells were plated in 200 jl in the upper part 
of a 24-well Boyden chamber insert with a membrane pore size of 8 um (Greiner 
Bio-One). The lower part of the chamber was filled with 600 jl medium containing 
500 ng ml’ recombinant CCL25 (R&D Systems). Following a 4h incubation at 
37°C and 7.5% COz, migrating cells in the lower chamber were counted using a 
Guava easyCyte cell counter (Millipore). Transwell migration experiments were 
run in triplicate for each condition on two independent tumour isolates. 
Histology and IHC. Organ samples were fixed in fresh 4% paraformaldehyde at 
4°C overnight and further subjected to routine histological procedures for embed- 
ding in paraffin. 5-l1m sections of the samples from at least three different animals 
per group were placed on microscopic slides next to one another to enable cross- 
comparison within a slide after H&E staining or IHC staining with the antibodies 
indicated. Antibodies used were GFP (D5.1) XP, PTEN (D4.3), p-AKT (S473) (D9E), 
p-ribosomal protein $6 (S235/236) (D57.2.2E), cleaved caspase 3 (Asp175) (all: Cell 
Signaling Technologies), Ki67 (VP-K451, Vector Laboratories) and CD3 (A0452) 
(DAKO). Stained slides were scanned with a Pannoramic Scan 250 Flash or MIDI 
system and images acquired using Pannoramic Viewer 1.15.2 (both: 3DHistech). 
Additional images were taken on a Zeiss Axio Imager.Z2 system. 

shCcr9 hairpin design and competition experiments. shRNAs targeting Ccr9 
were generated as previously described’ and cloned into a retroviral vector coex- 
pressing mCherry (LMN-C*). Knockdown of Ccr9 was quantified in a murine T-ALL 
cell line by flow cytometry using an Alexa 647-conjugated antibody against murine 
CCR9 (clone 9B1, eBiosciences). The two most efficient shRNAs, shCcr9.904 (97- 
mer: 5'-TGCTGTTGACAGTGAGCGCAAGGATAAGAATGCCAAGCTATAG 
TGAAGC CACAGATGTATAGCTTGGCATTCTTATCCTTATGCCTACTG 
CCTCGGA-3’) and shCcr9.2357 (97-mer: 5’-TGCTGTTGACAGTGAGCGCCC 
CAACAGTTTACAACCTTTAT AGTGAAGCCACAGATGTATAAAGGTTGT 
AAACTGTTGGGATGCCTACTGCCTCGGA-3’), were used for subsequent assays. 
For the infection of tumour cells, eGEP* shPten T-ALL cells were grown on OP9- 
DL1 feeders in the presence of 10ngml~' mIL7 in Optimem Glutamax medium 
(Gibco/Life Technologies) and infected with either shCcr9 or shRenilla control 
shRNAs. Approximately 0.75 X 10° infected shPten cells (20-30% mCherry~) were 
transplanted into recipient Rag] ‘~ mice irradiated with 450 rad, and monitored 
for disease development. Diseased mice were euthanized, and spleen, bone mar- 
row, liver, lung, small intestine and kidney tissues were collected and minced to 
generate single-cell suspensions for flow cytometric measurement of eGFP- and 
mCherry-positive cell fractions. Relative mCherry fractions from different organs 
were determined by normalizing to the spleen fraction as 100%. 

CCR9 inhibitor treatment. The CCR9 inhibitor CCX8037 was kindly provided 
by Chemocentryx**"". For cell culture treatment, different concentrations of CCX8037 
were used as indicated. For in vivo experiments, mice were treated with 30 mgkg ' 
inhibitor or vehicle (HPMC 1%) administered via subcutaneous injection every 
12h (CCX8037). 

Clonality analysis of murine shPten tumours. Clonality analysis by PCR amp- 
lification and sequencing of murine TCR sequences was performed as previously 
described”. In brief, clonality was assayed at V-DJ and D-J rearrangements in a 
mixure of 20 family-specific upstream primers located within VB gene segments, 
consensus primers located 5’ of DB1 and DB2 gene segments and consensus down- 
stream primers located 3’ of JB1 and JB2 gene segments. PCR products were ana- 
lysed by direct Sanger sequencing. 

Profiling of Notch1 mutations by Sanger sequencing. Vav-tTA;shPten tumours 
were analysed for Notch1 hotspot mutations located in Exon 26, 27 and 34 by Sanger 
sequencing as described previously”, including one new oligonucleotide primer 
pair: Ex34B-f: 5‘-GCCAGTACAACCCACTACGG-3’; Ex34B-r: 5'’-CCTGAAG 
CACTGGAA-AGGAC-3’, 

RNA-seq data generation and bioinformatic analysis. Total RNA was extracted 
from shPten leukaemic cells on/off Dox purified by sorting for CD4 expression with 
magnetic beads using Trizol (Invitrogen). RNA quality and quantity were assessed 
using Agilent RNA 6000 Nano Chip (Agilent Technologies) and Qubit 2.0 fluo- 
rometer (Life Technologies), respectively. RNA-seq libraries were prepared from 
1 pg of total RNA per sample using the TruSeq RNA Sample Preparation Kit v2 
(Illumina) following the standard protocol with slight modification (10 PCR cycles). 
RNA-seq library quality and quantity were assessed using the Agilent DNA 1000 
Chip and Kappa Library qPCR kit (Kappa Biosystems). Libraries were clustered 
on an Illumina cBot and sequenced on the HiSeq2000 (2 X 100-bp reads) using 
Illumina chemistry (Illumina). 

The RNA-seq paired-end reads were mapped to the mouse mm9 genome and its 
corresponding transcript sequences using an in-house mapping and quality assess- 
ment pipeline’*. Transcript expression levels were estimated as fragments per kilobase 
of transcript per million mapped reads (FPKM) and gene FPKMs were computed 
by summing the transcript FPKMs for each gene using the Cuffdiff2 program“. 
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We called a gene ‘expressed’ in a given sample if it had a FPKM value = 0.5 based 
on the distribution of FPKM gene expression levels and excluded genes that were 
not expressed in any sample from the final gene expression data matrix for down- 
stream analysis. Differentially expressed genes were identified using LIMMA* and 
false discovery rate was estimated by Benjamini-Hochberg method**. GSEA*”** 
and the Database for Annotation, Visualization and Integrated Discovery (DAVID 
v6.7)*° were used to assess pathway enrichment. All mouse RNA-seq data sets are 
submitted to the European Nucleotide Archive (ENA) and can be accessed under the 
accession number PRJEB5498 at http://www.ebi.ac.uk/ena/data/view/PRJEB5498. 
GSEA. To test for mouse shPten signature enrichment in PTEN-disrupted human 
T-ALL, we established a shPten-dependent signature using the 100 most upregu- 
lated genes in shPten T-ALL samples (untreated, n = 3) against PTEN-restored 
samples (Dox-treated, n = 4) samples. Publicly available human T-ALL gene expres- 
sion profiles (GSE28703, n = 47) were processed using RMA (quantile normaliza- 
tion) and supervised for PTEN status (PTEN disrupted including PTEN deletion, 
mutation or both, n = 10; PTEN wild-type, n = 37) according to the published 
sample genetic features annotation”. Statistical significance of GSEA results was 
assessed using 1,000 sample permutations. For enrichment of human PTEN T-ALL 
signature in mouse shPten T-ALL (Dox-off) profiles against PTEN-restored (Dox- 
on) profiles, a human PTEN-disrupted signature was generated by including the 
100 most upregulated genes in PTEN-disrupted versus PTEN-wild type T-ALL 
samples. Mouse genes were ranked by supervising untreated to Dox-treated shPten 
T-ALL. Statistical significance of human PTEN-disrupted signature enrichment 
was assessed using 1,000 gene set permutations. 
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Extended Data Figure 1 | Pten shRNA-transgenic mice enable conditional 
expression of PTEN and develop tumours after prolonged Pten knockdown. 
a, Western blot analysis of PTEN protein knockdown in NIH 3T3 cells infected 
with different Pten shRNAs at low multiplicity of infection. Le., long exposure; 
s.e., short exposure. b, PTEN protein knockdown assessed by western blot in 
ES cell clones targeted with two different Pten shRNAs, either treated with Dox 
or left untreated. c, MEFs from Rosa26-rtTA;shPten. 1522 transgenic mice, wild- 
type control mice, or Pten*'~ mice were treated with Dox for the indicated 
times and analysed for PTEN, GFP and ACTB expression by western blot. 

d, Overall survival of mice receiving bone marrow cells from tTA-transgenic 
mice infected with an inducible TRE-GFP-miR-30 (TGM) retroviral vector 
expressing shPten.1522, shPten.2049 or control after irradiation with 600 rad. 
e, Fluorescence image of a CAGGS-rtTA;shPten. 1522 mouse on Dox for 5 days 
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Time (days) 


200 


250 


-Dox (Pten on) 


GaAs 
PTEN pS6(S235/236) 
and a CAGGS-rtTA-only control mouse. f, Flow cytometric analysis of the 
peripheral blood of a CAGGS-rtTA;shPten. 1522 mouse on Dox and an off Dox 
control mouse for myeloid (CD11b) and GFP marker expression 10 days after 
initiating Dox food. g, Overall survival curve of CMV-rtTA;shPten. 1522 
double-transgenic and control mice (single-transgenic shPten.1522 or CMV- 
rtTA). Dox treatment for shRNA induction was started after weaning (at 

~4 weeks of age). h, Situs of a tumour-bearing CAGGS-rtTA;shPten. 1522 
double-transgenic mouse. A large thymic tumour (full arrow), as well as 
enlarged lymph nodes (dashed arrows) and spleen (arrowhead), are visible. 

i, Immunohistochemical staining of kidney sections from a CAGGS- 
rtTA;shPten.1522 mouse for the indicated antigens. Arrows highlight a tumour 
infiltrate around a kidney venule. Scale bars, 100 jum. 
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Extended Data Figure 2 | Vav-tTA;shPten transgenic mice with targeted heterogeneous GFP staining and correspondingly variable PTEN knockdown. 
shPten expression in the haematopoietic lineage display thymic hyperplasia Scale bars, 200 jum for H&E and GFP, 100 um for PTEN. The insets are X2 


by 6 weeks, and a subset develops thymic tumours infiltrating multiple magnifications. d, Thymus weight of 6-week-old Vav-tTA;shLuc and Vav- 
peripheral organs. a, Brightfield (left) and fluorescence (right) images of tTA;shPten mice (n = 5 for both groups, P < 0.006 by t-test). e, Brightfield and 
spleen and thymus from Vav-tTA;shLuc (control) and Vav-tTA;shPten double- | GFP-fluorescence images of a Vav-tTA;shPten mouse with tumours. f, IHC 
transgenic mice at 5 weeks. b, Florescence-activated cell sorting analysis of staining of spleen, liver and kidney tissues from a Vav-tTA;shPten mouse with 


spleen and thymus single-cell suspensions from Vav-tTA;shLuc/shPten mice _ primary T-cell disease. Sections were stained for H&E, GFP, PTEN and 

for CD4/CD8 expression. c, IHC analysis of thymic tissue from 6-week-old phospho-AKT(S473) as indicated, showing heterogeneous staining owing to 
Vav-tTA;shLuc and Vav-tTA;shPten mice. Sections were stained with variable tumour infiltration. Scale bars, 100 kum, insets 25 pum. 

haematoxylin and eosin (H&E), anti-GFP or anti-PTEN antibodies, showing 
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Extended Data Figure 3 | Immunophenotype, chromosomal aberrations 
and Notch1 mutations observed in murine shPten tumours. a, Flow 
cytometric analysis of organ infiltration by primary tumours in Vav- 
tTA;shPten.1522 transgenic mice. Single-cell suspensions of indicated tissues 
were analysed for eGFP, Thy1.2, CD4 and CD8 expression. BM, bone marrow; 
LN, lymph node; PB, peripheral blood. b, Spectral karyotyping analysis of a 
T-cell tumour arising in Vav-tTA;shPten.1522 mice, showing a t(14;15) 
translocation. c, CGH analysis of a Vav-tTA;shPten.1522 leukaemia. 
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Genomic tumour DNA was analysed on Affymetrix CGH SNP arrays and 
compared to normal skin tissue. x axis indicates genomic coordinate and y axis 
represents log,(tumour/germline). d, Schematic of the murine NOTCH1 
protein generated using protein paint (http://explore.pediatriccancer 
genomeproject.org/proteinPainter), highlighting the different NOTCH1 
protein domains and the mutations detected in the murine shPten and Pten ~ 


T-ALL tumours. 
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a Mouse 
No. 


Karyotype 


CMg90 40~42, XY, +Y¥[1], -X[1], +2[8], -8[1], -18[1] 
[cp 10] 


CM9g92 41, XX, +15 


CM122 37~41, XX, t(14;15) [6],  -X[1], +10[1], -3[1], 
-8[1], -9[1], -16[1], +4[1], +10[1] [cp11] 


b GeneName Chr mm9Q Pos Class AAChange __ ProteinGl mRNA_acc Sample Name 
Notch1 2 26315225 frameshift $2475fs 224967065 NM_008714 shPten tumor 1 primary 
Notch1 2 26315225 frameshift S2475fs 224967065 NM_008714 shPten tumor 1 secondary 
Notch1 224967065 NM_008714 shPten tumor 2 
Notch1 224967065 NM_008714 shPten tumor 2 
Notch1 2 26315564 frameshift R2361 fs 224967065 NM_008714 shPten tumor 3 
Notch1 2 26322049 missense F1692S 224967065 NM_008714 shPten tumor 3 
Notch1 2 26315567 frameshift | T2360fs 224967065 NM_008714 shPten tumor 4 
Notch1 2 26315567 frameshift T2360fs 224967065 NM_008714 shPten tumor 5 
Notch1 224967065 NM_008714 shPten tumor 6 
Notch1 ft 224967065 NM_008714 shPten tumor 6 
Notch1 2 26315553 deletion R2361 fs 224967065 NM_008714 PTEN-KO tumor 
c Pten 
shRNA 97-mer oligo 
shPten.1967 TGCTGTTGACAGTGAGCG CCCAGATGTTAGTGACAATGAA TAGTGAAGCCACAGAT 
GTATTCATTGTCACTAACATCTGGA TGCCTACTGCCTCGGA 
shPten.2049 TGCTGTTGACAGTGAGCG AAAGATCAGCATTCACAAATTA TAGTGAAGCCACAGAT 


GTATAATTTGTGAATGCTGATCTTC TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG CATCGATAGCATTTGCAGTATA TAGTGAAGCCACAGAT 


SaPiena7e8 GTATATACTGCAAATGCTATCGATT TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG ACCAGCTAAAGGTGAAGATATA TAGTGAAGCCACAGAT 


ante ANee GTATATATCTTCACCTTTAGCTGGC TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG CTTGGGTAAATACGTTCTTCAT TAGTGAAGCCACAGAT 


Snr ienaleay GTAATGAAGAACGTATTTACCCAAA TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG ATTCTGTGAAGATCTTGACCAA TAGTGAAGCCACAGAT 


BEE TEn anes GTATTGGTCAAGATCTTCACAGAAG TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG ACTAAGTGAAGATGACAATCAT TAGTGAAGCCACAGAT 


SuPIEN Vee GTAATGATTGTCATCTTCACTTAGG TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG AATCAGCATTCACAAATTACAA TAGTGAAGCCACAGAT 


Sn ten e0ee GTATTGTAATTTGTGAATGCTGATC TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG ACAGTATAGAGCGTGCAGATAA TAGTGAAGCCACAGAT 


Snir veuenney GTATTATCTGCACGCTCTATACTGC TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG CATGGCTAAGTGAAGATGACAA TAGTGAAGCCACAGAT 


anReH Aes GTATTGTCATCTTCACTTAGCCATT TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG ACAGCTAAAGGTGAAGATATAT TAGTGAAGCCACAGAT 


SiriaatseS GTAATATATCTTCACCTTTAGCTGG TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG CGCAGATAATGACAAGGAGTAT TAGTGAAGCCACAGAT 


SHEEN ASS GTAATACTCCTTGTCATTATCTGCA TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG CCAGATTGCAGTTATAGGAACA TAGTGAAGCCACAGAT 


snrien 2136 GTATGTTCCTATAACTGCAATCTGA TGCCTACTGCCTCGGA 


TGCTGTTGACAGTGAGCG CAGTGTTATAAACTCCACTTAA TAGTGAAGCCACAGAT 


SNE IE S52) GTATTAAGTGGAGTTTATAACACTA TGCCTACTGCCTCGGA 


Extended Data Figure 4 | Summary of karyotyping and Notch1 sequencing are highlighted. b, Summary of Notch1 mutations identified in shPten and Pten 
of shPten T-ALL tumours. a, Results from a multiplex FISH analysis of three — knockout tumours. ¢, Sequence of all shRNAs targeting murine Pten that were 
different primary shPten-induced T-ALL tumours. At least ten cells were tested in the study. Sense and guide strand are highlighted in red. 

analysed for each sample, and chromosomal gains, deletions or translocations 
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Extended Data Figure 5 | GSEA shows similar gene expression patterns in (WT), n = 37) according to the published sample annotation". Statistical 
human and mouse T-ALL lacking Pten. a, GSEA of a mouse shPten signature _ significance of GSEA results was assessed using 1,000 samples permutations. 


in PTEN-altered human T-ALL was tested after establishing the shPten- b, For enrichment of human PTEN T-ALL signature in mouse shPten 
dependent signature using the 100 most upregulated genes in shPten T-ALL knockdown (kd) T-ALL (Dox-off) profiles against Pten-restored (Dox-on) 
samples (untreated, n = 3) against PTEN-restored samples (Dox-treated, profiles, a human PTEN-disrupted signature was generated by including the 
n= 4) as determined by RNA-seq analysis (data not shown). Publicly available 100 most upregulated genes in PTEN-disrupted versus PTEN-wild type T-ALL 
human T-ALL gene expression profiles (GSE28703, n = 47) were processed samples. Mouse genes were ranked by supervising untreated to Dox-treated 


using RMA (quantile normalization) and supervised for PTEN status (PTEN — shPten T-ALLs. Statistical significance of human PTEN-disrupted signature 
altered including PTEN deletion, mutation or both, n = 10; PTEN wild-type enrichment was assessed using 1,000 gene set permutations. 
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Extended Data Figure 6 | Secondary recipients of shPten T-ALL cells display 
extensive intestinal tumour infiltration similar to a subset of human 
patients characterized by peripheral T-cell lymphoma and low PTEN 
expression. a, Overall survival of sublethally irradiated Rag] ’~ mice 
transplanted with 1 X 10° T-ALL cells from primary Vav-tTA;shPten. 1522 
mice compared to untransplanted mice (m = 5 for both groups, P< 0.003). 
b, IHC staining for eGFP expression in the indicated tissues from secondary 
T-ALL transplant recipients. Scale bars, 400 jm, 100 um for insets. c, Overall 
survival of PTEN normal (WT) versus PTEN-altered patients with T-ALL 
analysed from published data on patients with T-ALL", P = 0.02). PTEN- 
altered (n = 20) includes patients with PTEN deletion, mutation, 
underexpression (<0.8 sigma after z scoring) and any combination of such 
alterations; PTEN normal ( = 62) include all other patients with available 
data. d, PTEN IHC staining of tissue microarrays of tumour sections from 


+ Dox (Pten on) 


Memorial Sloan Kettering Cancer Center patients with peripheral T-cell 
lymphomas. Examples of low (top panel) and high (bottom panel) PTEN 
expression samples are shown. e, Contingency table showing a significant 
association (P < 0.003; Fisher’s exact test) between low expression of PTEN and 
intestinal infiltration in PTCL patients. f, Overall survival of Rag1 ~~ mice 
transplanted with T-ALL cells from Pter!™". Lck-cre mice + Dox (n = 5 for each 
group). g, Weight of spleen (n = 4) and lymph nodes (n = 8) in Rag] ‘~ mice 
transplanted with Vav-tTA;shPten leukaemic cells untreated or treated with 
Dox for 5 days (+ s.d.). h, MRI of Ragi ‘~ mice transplanted with Vav- 
tTA;shPten leukaemic cells untreated or treated with Dox for 5 days, 14 days 
after transplant. Arrows highlight lymph nodes (LN) and increased signals in 
the liver. Representative images for one out of three analysed mice per 
condition are shown. i, MRI imaging of the intestine and liver of the same mice 
as in h are shown. Dashed arrows highlight the liver, full arrows the intestine. 
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Extended Data Figure 7 | Pten reactivation affects multiple pathways and 
increases apoptosis in tumour cells infiltrating the intestine, but not in the 
spleen. a, b, Heatmap of top 30 upregulated (a) and downregulated (b) genes 
after Pten reactivation as determined by RNA-seq on CD4-sorted leukaemic 
samples isolated from the spleen. Three mice with Pten knocked down and 
four mice with reactivated Pten were analysed. Pten is one of the top 50 
upregulated genes after reactivation, but is not included on the list. 

c, Bubblegraph visualization of the most significantly affected pathways as 
determined by DAVID pathway analysis. y axis represents relative pathway 
enrichment in Pten reactivated versus Pten knockdown leukaemic cells, and 
size of the bubble graph is inversely proportional to Pvalue. d, IHC analysis for 
expression of GFP and PTEN in spleen, lymph node (LN) and liver from 
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Fe 
shPten-tumour-transplanted mice + Dox treatment (5 days after start of Dox 
treatment; n = 3 per group). Representative sections are shown. Scale bars, 
100 um for full images, 20 jm for insets. e, In vivo 5-bromodeoxyuridine 

(BrdU) uptake in leukaemic cells isolated from the lymph nodes of mice 

transplanted with Vav-tTA;shPten primary T-ALL tumours + Dox. n = 3 for 
each group (+ s.d.). f, TUNEL staining of spleen and intestinal sections of 
Rag! ‘~ mice serially transplanted with Vav-tTA;shPten leukaemia cells and 
either left untreated or treated with Dox 24h before sectioning. Scale bars, 
200 um (X2.5) and 50 jim (X10). g, Quantification of TUNEL-stained sections 
from the intestinal sections in f. TUNEL positive cells from three representative 


areas of 1 mm? from two different intestine sections were counted for each 
condition (P< 0.01) (+s.d.). 
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Extended Data Figure 8 | AKT and S6 protein phosphorylation is affected 
by PTEN reactivation in the intestine. a, IHC staining for phospho-S6 
(pS235/236-S6) and phospho-AKT (pS473AKT) of spleen sections from 
Rag1~'~ mice transplanted with Vav-tTA;shPten tumour cells from primary 
mice and either treated with Dox or left untreated 2 days after treatment begin 
(n = 3 per group). Scale bars, 100 jim, 20 jm for insets. Representative images 
are shown. b, IHC staining for pS473-AKT (bottom) in the intestine, showing 
very low pAKT signal in the intestinal epithelial cells independent of Dox 
treatment status (arrows; bottom left and right panels), conversely strong 
staining for pAKT was detected in some of the infiltrating tumour cells 
(arrow heads). The signal was reduced concomitantly with the overall 


(Pten off) (Pten on) (Pten off) (Pten on) 

reduction of the Pten-shRNA-linked GFP signal (top) after 36 h of Dox 
treatment (+ Dox; right panels). c, Representative histogram of flow cytometric 
analysis for intracellular pS6 signal in CD4" cells isolated from spleen and 
intestine of Ragi~'~ mice transplanted with shPten tumour cells and either 
treated with Dox for 5 days or left untreated. d, Flow cytometric quantification 
of pS6 signal in CD4* cells isolated from the intestine (d) and spleens (e) of 
Ragi~'~ mice transplanted with primary shPten tumours and treated + Dox 
for 5 days (n = 4 for each condition, P < 0.04 for the intestine and not 
significant for the spleen by paired t-test). MFI, mean fluorescent intensity, 
error bars represent s.d. 
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Extended Data Figure 9 | NCr mice display a reduced intestinal tumour 
infiltration, which is not dependent on the absence of the thymus. 

a, Brightfield pictures of the intestinal sites of Ragl ~~ and NCr nude mice 
serially transplanted with shPten tumours (top four panels) and fluorescence 
images (FI) of cells infiltrating the small intestine in these mice (bottom four 
panels). Scale bars, 800 jim (top FI panels) and 100 jum (bottom FI panels). 
Pictures were taken on a Nikon SMZ 1000 stereomicroscope. b, Quantification 
of the intestinal infiltration in transplanted Rag] ‘~ or NCr mice by flow 
cytometry (P < 0.03). ¢, d, Weight of lymph nodes (P< 0.01) (c) and spleens 
(P = n.s.) (d) in transplanted Ragl ~/~ and NCr mice. e, CCL25 expression in 
the small intestine of Rag] ‘~ and NCr mice measured by ELISA. Error bars 
in b-e show s.d. f, Western blot analysis of CCL25 expression in the small 
intestine of Rag1~’~ and NCr mice. g, Overall survival of Rag1~/~ and 
thymectomized Rag]~’~ mice after transplant with shPten T-ALL cells (n = 5 


per group). h, H&E and immunohistochemical analysis of CD3 expression of 
spleen, liver and intestine from Rag! ‘~ and thymectomized Rag! ‘~ mice 
transplanted with shPten T-ALL cells. Scale bars, 200 tm for spleen and liver 
and 100 ym for intestinal samples. i, Flow cytometric measurement of CCR9 
expression on shPten leukaemia cells either in the absence of Dox (Pten 
knocked down) or Dox-treated (Pten reactivated). One representative analysis 
out of four analysed on/off Dox pairs is shown. A CCR9 negative B-cell line was 
used as control. j, Immunoblot analysis of PTEN, p-AKT(S473) and ACTB 
expression in human HBP-ALL T-ALL cells infected with either a control 
shRNA (shRenilla) or a shRNA targeting PTEN, and either starved or 
stimulated for 15 min with 500 ng ml! CCL25. k, shPten tumour cell 
migration across a Boyden chamber in the presence or absence of 1 jig ml! 
Dox and 500ngml' CCL25. One representative experiment of two is shown, 
samples were run in triplicate; **P < 0.01, ***P < 0.001 by t-test (+ s.d.). 
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Extended Data Figure 10 | CCR9 inactivation by shRNA knockdown or by 
pharmacologic inhibition attenuates intestinal tumour infiltration. 

a, CCR9 expression on the surface of shPten tumour cells either infected with a 
control shRNA (shRenilla) (left) or with a shRNA targeting Ccr9 (right) as 
measured by flow cytometry, compared to uninfected cells, respectively. b, Flow 
cytometry-based quantification of CCR9 suppression in shCcr9-infected shPten 
T-ALL cells compared to shRenilla-infected cells, n = 5 for each cohort (= s.d.). 
c, Raw percentage of shRenilla/shCcr9-expressing shPten T-ALL cells + s.d. 
in different tissue compartments of mice 12 days after transplantation, 
determined by flow cytometry, n = 5 for each cohort. P< 0.0005 (intestine). 
d, IHC analysis for mCherry (left, shRenilla-mCherry; right, shCcr9-mCherry)- 
expressing cells in tissue sections of mice from c. Spleen, liver and intestinal 
sections of mice transplanted with shRenilla- or shCcr9-infected T-ALL cells 
were analysed for mCherry expression. Representative stains from one mouse 


eGFP 


pAKT(S473) 


out of three analysed mice are shown. Scale bars, 100 jum, insets 20 jim. e, IHC 
staining for eGFP expression in representative sections of small intestine, liver 
and spleen of Vav-tTA;shPten-tumour-bearing mice treated with vehicle or the 
CCR9 inhibitor CCX8037 (n = 3). Scale bars, 400 um (2.5) and 100 um 
(X10). f, Flow cytometric quantification of intestinal tumour infiltration in 
Ragl ’~ mice transplanted with Vav-tTA;shPten leukaemia cells and treated 
with vehicle (n = 4) or a small molecule inhibitor of CCR9 (n = 5). *P< 0.05 
by t-test (+ s.d.). g, Immunoblot analysis of p-AKT expression 15 min after 
stimulation of shPten leukaemia cells with CCL25 in the absence or presence of 
indicated concentrations of CCX8037. h, IHC analysis of eGFP and p-AKT 
signal in representative sections of small intestine from Vav-tTA;shPten- 
tumour-bearing mice treated with vehicle or the CCR9 inhibitor CCX8037. 
Scale bars, 100 um, 25 tm for insets. 
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Inactivation of PI(3)K p1106 breaks regulatory 
T-cell-mediated immune tolerance to cancer 


Khaled Ali‘+, Dalya R. Soond?*}, Roberto Pifieiro'*, Thorsten Hagemann’, Wayne Pearce!, Ee Lyn Lim?, Hicham Bouabe?, 
Cheryl L. Scudamore’, Timothy Hancox, Heather Maecker®, Lori Friedman®, Martin Turner’, Klaus Okkenhaug’s 


& Bart Vanhaesebroeck!8 


Inhibitors against the p1106 isoform of phosphoinositide-3-OH kinase 
(PI(3)K) have shown remarkable therapeutic efficacy in some human 
leukaemias’”. As p1106 is primarily expressed in leukocytes’, drugs 
against p1106 have not been considered for the treatment of solid 
tumours*. Here we report that p1106 inactivation in mice protects 
against a broad range of cancers, including non-haematological solid 
tumours. We demonstrate that p1106 inactivation in regulatory T cells 
unleashes CD8" cytotoxic T cells and induces tumour regression. Thus, 
p1106 inhibitors can break tumour-induced immune tolerance and 
should be considered for wider use in oncology. 

PI(3)Kpl 10827104 (§P919A) mice, in which endogenous p1106 kinase 
is inactive, present specific immune deficiencies*® but are not predis- 
posed to cancer. To test whether host p1106 activity affects tumour 
growth, we inoculated weakly immunogenic syngeneic cancer cell lines 
into 6P?!4 mice. Compared to wild-type mice, 6 910A tice were more 
resistant to B16 melanoma, with reduced tumour incidence and almost 
abrogated lymph node metastasis in those mice that developed tumours 
(Fig. 1a). Growth of Lewis lung carcinoma (LLC) and EL4 thymoma 
cells was also suppressed in 5°!" mice (Fig. 1b, c). Similar observations 
were made with luciferase-labelled 4T1 breast cancer cells injected into 
the mammary fat pad. At euthanization, §P104 mice showed reduced 
mass and luciferase activity of the primary 4T1 tumour (Fig. 1d) and 
lower metastasis (Fig. le). In wild-type mice, 4T1 tumours were detected 
by day 10 and grew progressively until day 30, at which point the mice 
became moribund (Fig. 1f). In some 5°?!“ mice, 4T1 tumours grew 
initially, but then started to regress from days 15-20 onwards (Fig. 1f). 
Across ten independent experiments, 97% (71/73) of wild-type mice 
had an observable cancer mass at the end of study, compared to 65% 
(43/66) of 5P?!°4 mice, with a median survival time of 23 and 40 days 
in wild-type and 8°! mice, respectively (Fig. 1g). 

Effective tumour immunity is limited by regulatory T cell (T,eg)- 
mediated immune suppression’. 5°?!°“ mice show enhanced FOXP3* 
cpD4~ Treg in the thymus but impaired subsequent T,.¢ maintenance 
and functionality in the periphery®. 5°?!“ Tyeg also produce less inter- 
leukin (IL)-10 and express lower levels of CD38, but show normal expres- 
sion of most “T,..-signature’ genes, including FOXP3, CD25 (also known 
as IL2ZRA), CTLA4 and ICOS*”. We therefore considered that reduced 
Treg function in 809104 mice might lead to enhanced tumour resistance. 
FOXP3* CD4" T,-gin the draining lymph nodes of 4T1 tumour-bearing 
5°14 mice did not expand as robustly as in wild-type mice (Fig. 2a); 
however, no consistent differences in T,., expansion were observed in 
the B16 or EL4 tumour models between naive and tumour-bearing 
mice of either genotype (not shown). To assess Tyg function, we car- 


ried out adoptive T,., transfer experiments in EL4 tumour-bearing 
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Figure 1 | Impact of genetic inactivation of p1105 on tumour growth and 
metastasis. a, Percentage of mice with visible B16 ear tumours (left) or lymph 
nodes metastasis (right). Photographs show B16 metastases in cervical 
lymph nodes and representative excised lymph nodes. b-d, Primary tumour 
burden of the indicated tumour lines. e, 4T1 metastasis as detected by luciferase 
activity (left and middle) or histology (right), expressed as a percentage of the 
total number of tumour-bearing animals per group. f, Growth of primary 
4T1 tumours. g, Survival of 4T1 tumour-bearing mice. *P < 0.05, **P < 0.01 
(non-parametric Mann-Whitney t-test). Numbers in brackets represent 
number of mice used per experiment. Each dot represents an individual mouse. 
Shown are the mean + standard error of mean (s.e.m.) from at least two 
independent experiments in which statistical significance was demonstrated. 
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Figure 2 | Inactivation of p1106 in Tg is sufficient to confer cancer 
resistance. a, Relative and total numbers of T,.g in the draining lymph nodes 
of naive and 4T1 tumour-bearing mice. b, Impact of adoptive transfer of Tyeg 
into 8°?! mice on EL4 tumour wet weight and tumour-infiltrating CD8* 
T cells. c, Number of mice with visible B16 tumours and B16 tumour weight in 
mice of the indicated genotype. d, Survival of EL4 tumour-bearing mice of 
the indicated genotype. *P < 0.05, **P < 0.01, ***P < 0.001 (a-c, non- 
parametric Mann-Whitney t-test; d, log-rank (Mantel-Cox) test). Numbers 
in brackets denote the number of mice used per experiment. Each dot 
represents an individual mouse. Shown are the mean + standard error of mean 
(s.e.m.) from at least two independent experiments in which statistical 
significance was demonstrated. 


mice. Transfer of wild-type Ty eg into §P9104 mice restored EL4 tumour 
growth and suppressed the relative abundance of tumour-infiltrating 
CD8°* T cells (Fig. 2b). By contrast, the transfer of the same number of 
HO2108 Treg into 6 9104 mice did not affect EL4 tumour growth (Fig. 2b), 
indicating a functional defect in 8??!°* Treg: FOXP3 YFP Cre §flox/flox 
mice, in which p1106 was selectively deleted in T,g (by a Cre transgene 
expressed from the Foxp3 locus) did not display spontaneous auto- 
immune or inflammatory responses (not shown) but showed reduced 
growth of B16 cells (Fig. 2c) and extended survival time upon inocula- 
tion of EL4 cells, to an even greater extent than in 8°!“ mice (Fig. 2d). 
These data demonstrate that p1106 inactivation in T,., is both neces- 
sary and sufficient to confer tumour resistance. However, these data also 
revealed a potential negative impact of p1106 inhibition on effector T cells, 
as FOXP3*FP-Crex §flox/flox rm ice were more cancer-resistant than 5°10“ 
mice (Fig. 2d). We therefore investigated the effect of p1106 inactivation 
on CD4 and CD8 effector T cells in the context of an ongoing tumour 
response. 

Depletion of CD8* T cells but not of CD4* T cells on day 10 after 
AT1 inoculation in 8°?!“ mice eliminated cancer protection (Fig. 3a, b). 
These data show that CD8" T cells are responsible for restricting tumour 
growth in ”!° mice, but do not exclude an accessory role for CD4* 
T cells. In line with published data’®, naive wild-type mice had higher 
relative numbers of activated/memory CD44" CD4* and CD44" 
CD8* T cells than 8°?!“ mice (Extended Data Fig. 1a). Upon 4T1 inoc- 
ulation in wild-type mice, the relative numbers of these cells were either 
enhanced (tumour-draining lymph nodes) or reduced (blood and spleen), 
but in 87°! mice showed a trend towards expansion (Extended Data 
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Fig. 1a), indicating that 8°?!’ mice are capable of mounting both CD4* 


and CD8* T-cell responses against 4T 1 tumours. Wild-type and 5°10" 
splenocytes from tumour-bearing mice, incubated in vitro with mito- 
mycin C-treated 4T1 cells, generated equivalent cytotoxic activity against 
4T1, with no specific lysis of LLC (Fig. 3c). Compared to wild-type cul- 
tures, 5°°!°4 cultures contained similar proportions of CD4 and CD8 
T-cell subsets (Extended Data Fig. 1b), with a reduced frequency of 
activated/memory CD44"8" CD4 * cells (Fig. 3d) and unaffected frequency 
of CD44"" CD8* cells (Fig. 3d). Interestingly, despite this reduced 
proportion of 8°71" cp44™8" CD4* cells, the frequency of interferon 
(IFN)-y* CD4* cells in phorbol myristate acetate (PMA)/ionomycin- 
stimulated cultures of splenocytes from 4T1 tumour-bearing mice was 
unaffected by p1106 inactivation (Fig. 3e), with the frequency of IFN- 
y* CD8* cells even enhanced upon p1108 inactivation (Fig. 3e). Upon 
inoculation with LLC cells expressing ovalbumin (LLC-OVA), wild-type 
and §?°!°4 mice generated similar levels of tumour-infiltrating OVA- 
specific CD8™ T cells (Fig. 3f), showing that systemic in vivo inactivation 
of p1106 does not impede the development or recruitment of antigen- 
specific anti-tumour CD8* cells. 

To test the intrinsic ability of $9194 CD8 T cells to eliminate tumours, 
we crossed 8°! mice to OT-I transgenic mice, which carry an OVA- 
specific MHC class I-restricted T-cell receptor transgene. In vitro-generated 
§°104 OT-I cytotoxic T lymphocytes (CTLs) were less efficient than 
wild-type OT-I CTLs at EL4-OVA killing (Fig. 3g) and produced lower 
levels of cytotoxic mediators (Extended Data Fig. 1c). Pharmacological 
inactivation of p1106 during the in vitro CTL expansion of wild-type 
OT-I cells partially suppressed CTL function, in a manner indistinguish- 
able from genetic inactivation of p1106 (Fig. 3g), whereas p1106 block- 
ade during the killing phase itself did not affect CTL function (Fig. 3g). 
Despite these in vitro defects in 8°°!°“ OT-I CTLs, adoptive transfer of 
these cells in wild-type mice before challenge with EL4-OVA provided 
equal cancer protection to inoculation of wild-type OT-I T cells (Fig. 3h), 
showing that in vivo CTL responses can remain competent in the absence 
of CD8 T-cell-intrinsic p1106 activity. Taken together, these data indi- 
cate that p1106 inhibition impairs differentiation of CD8 T cells to 
become fully competent CTLs; however, fully differentiated CTLs do 
not seem to require p1106 activity to kill target cells and on balance, in 
the context of reduced T,¢, function in §?9194 mice, can mediate effec- 
tive anti-tumour activity. 

CD4 T cells can also contribute to tumour elimination by promoting 
the activation of macrophages and natural killer cells or by direct lysis 
of MHC class II* tumour cells'’. Indeed, CD4* T cells with enhanced 
PI(3)K activity are superior in their capacity to reject tumour growth, 
probably as a consequence of their increased production of IFN-y"’. Con- 
versely, 5°°!°“ OT-II CD4 cells were less effective than wild-type OT-II 
cells in preventing EL4-OVA tumour growth (Fig. 3h), consistent with 
our previous finding that 8°?!°“ OT-IIT cells produce less IEN-¥ in vitro 
and in vivo'*"?, Therefore, in the context of an otherwise normal immune 
system, $9104 CD4* cells show inferior anti-tumour immunity. However, 
the production of IFN-y by CD4 and CD8 T cells from 4T1 tumour- 
bearing °°! mice, in which Treg are also defective, appeared to be 
intact (Fig. 3e), suggesting that p1106 inhibition can affect the balance 
between regulatory and effector CD4* T cells such that the effector 
cells prevail in the context of anti-tumour responses. 

A salient feature of CD4 and CD8 T cells is the ability to raise a more 
potent and rapid immune response to subsequent exposure to cognate 
antigen. Upon surgical removal of 4T1 primary tumours when they had 
reached 9 mm in diameter and established metastatic foci", wild-type 
mice all succumbed to regrowth of the primary tumour and metastatic 
disease. By contrast, >50% of post-surgical 5°°!°“ mice showed sur- 
vival extension beyond 100 days (Fig. 3i), demonstrating that p1106 
inhibition can suppress cancer relapse and presumably metastatic cancer 
after surgery. 8°”"°* mice which had remained tumour-free for >200 days 
after surgery were cancer-resistant upon rechallenge with a higher 4T1 
dose (Fig. 3j), suggesting that surgical intervention in 8°?!“ mice sup- 
ports the development of an effective memory anti-tumour response. 
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Figure 3 | Impact of p1106 inactivation on T-cell-mediated anti-tumour 
immunity. a, Growth of 4T1 in §PP10A mice injected with antibodies to CD4 or 
CD8. Arrow indicates the time of antibody injection. b, Metastasis in CD4 or 
CD8 T-cell-depleted 4T1 tumour-bearing 8°°!* mice. ¢, In vitro cytotoxic 
activity of splenocytes, isolated from 4T 1 tumour-bearing wild-type and 5°?!" 
mice 21 days after inoculation and cultured for 4 days with mitomycin-treated 
4T1 cells. E/T, effector to target (4T1 or LLC) ratio. d, Frequency of cpa44hish 
CD4* and CD8* T cells in splenocytes from 4T1 tumour-bearing mice 
cultured for 5 days with mitomycin-treated 4T1 cells. e, Frequency of IFN-y* 
T cells after 16h PMA+ ionomycin stimulation of splenocytes from wild-type 
and 5”?! 4T1 tumour-bearing mice. f, Relative levels of tumour-infiltrating 
CD3* lymphocytes and OVA-specific CD8* T cells in LLC-OVA tumours 


Toassess the potential importance of p1106 in myeloid cells in cancer, 
we tested the impact of p1108 inactivation in Rag ‘~ mice. Rag ‘~ mice, 
which lack mature B and T cells, showed enhanced primary 4T1 tumour 
size and metastasis (Fig. 4a) compared to wild-type mice. Rag /~ X50°10* 
mice showed a similar 4T1 primary tumour burden to Rag ‘~ mice 
(Fig. 4a) but had fewer metastatic lesions in lung (Fig. 4a) and liver (not 
shown), indicating that p1106 inactivation in a non-B/T-cell lineage 
delays 4T1 tumour progression but is not sufficient to instigate tumour 
rejection. We next assessed the impact of p1100 inactivation on myeloid- 
derived suppressor cells (MDSCs), a heterogeneous population of bone 
marrow-derived myeloid cells that co-express the CD11b and Grl sur- 
face markers and which have a prominent role in immune suppression 
in cancer'™"®, Neutrophils are also CD11b* Gr1* but are thought not 
to be immune-suppressive’”. Upon inoculation with 4T1 cells, known 
to be potent MDSC inducers’®*!”, CD1 1b* Gr1" cells accumulated in 
the spleens of both wild-type and 5°°!°“ mice, even before tumours 
were palpable, and continued to differentially accumulate in both geno- 
types as tumours grew, correlating with tumour size (Fig. 4b). The Ly6C 
and Ly6G surface markers, which are both recognized by the Grl anti- 
body, have been used to subdivide MDSCs into two CD11b~ sub- 
populations, namely monocytic (M)-MDSCs (Ly6C™®" Ly6G’") and 
polymorphonuclear (PMN)-MDSCs (Ly6C/°“ Ly6G"'8")”, Although 
neutrophils are difficult to differentiate from PMN-MDSCs, here we 
designated the neutrophil population as Ly6G"2" cells with intermediate/ 
high Ly6C expression (Fig. 4c and Extended Data Fig. 2a). PMN-MDSCs, 
predominant in 4T1 tumour-bearing wild-type mice (Fig. 4c), were sub- 
stantially reduced in 8”°'** mice, correlating with a relative increase in 
neutrophils in the latter (Fig. 4c). Interestingly, the number of PMN- 
MDSCs in spleens from 4T1 tumour-bearing mice correlated with the 
number of Tyegs (Fig. 4d). Depletion of CD8 * cellsin 4T1 tumour-bearing 


in wild-type or 8°71“ mice. g, In vitro cell killing of a 1:1 EL4-OVA:EL4 mix 
following 24h incubation with CTL from wild-type or a 8°°'°* OT-I mice 
(atan E/T ratio of 10:1), incubated with or without the p1106 inhibitor IC87114 
during the 8-day expansion phase, the 24-h killing phase, or both. Cell killing 
efficiency is expressed as the ratio of EL4-OVA cells over EL4 cells 
remaining after incubation with effector cells. h, Effect of adoptive transfer in 
wild-type mice of OT-I CD8* or OT-II CD4* cells on growth of subsequently 
inoculated EL4-OVA. i, Survival of post-surgical 4T1 tumour-bearing mice. 
j, Survival of §P104 mice that had remained tumour-free >200 days after 
surgery, and of naive wild-type mice, following injection of 10,000 4T1 cells. 
Statistics are as described in the legend to Fig. 2. 


§2104 mice, which led to enhanced tumour growth (Fig. 3a, b), also led 


to increased PMN-MDSC numbers and reduced neutrophil numbers 
(Fig. 4e). It was therefore difficult to ascertain whether the reduced PMN- 
MDSC numbers in 5°?! mice are a consequence of an intrinsic role 
for p1106 in these cells or an indirect consequence of a reduced tumour 
burden in 8°?!“ mice (Fig. 4b). In support of the former, wild-type 
PMN-MDSCs suppressed T-cell proliferation in vitro, whereas MDSCs 
from 5°°'° mice with regressing tumours did not (Fig. 4fand Extended 
Data Fig. 2b). Neutrophils from both genotypes did not suppress T-cell 
responses (Fig. 4f). Moreover, splenocytes from tumour-bearing 8?°!°* 
mice showed reduced in vitro production of transforming growth factor- 
B, vascular endothelial growth factor and IL-6 (Fig. 4g), each of which 
can contribute to immune suppression and/or tumour growth’>"*. 

Administration of PI-3065, a small molecule inhibitor with selec- 
tivity for p1106 (Extended Data Fig. 3a, b and Extended Data Table 1), 
also suppressed 4T1 tumour growth and metastasis, to a similar extent 
as genetic inactivation of p1106, marked by initial tumour progression, 
followed by tumour regression (Fig. 5a and Extended Data Fig. 3c, d). 
Of interest, 4T1 cells do not express detectable levels of p1106 (Extended 
Data Fig. 3e) and are not growth-inhibited in vitro by PI-3065 (Extended 
Data Fig. 3f). Long-term administration of PI-3065 to mice was well- 
tolerated and did not induce weight loss (Extended Data Fig. 3g). 

We next tested the impact of PI-3056 in the LSL.Kraso?"*, pa *, 
Pdxcete/+ (or KPC) model of pancreatic ductal adenocarcinoma, which 
expresses endogenous mutant KRAS?!?? and pa inPDX1* pan- 
creatic cells. KPC mice were left to develop palpable disease before treat- 
ment with vehicle or PI-3065 was commenced. Under these therapeutic 
conditions, PI-3065 prolonged survival and reduced the incidence of 
macroscopic metastases and other disease-associated pathologies (Fig. 5b). 
The relative abundance of peripheral T,.g in lymph nodes after 7 days 
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Figure 4 | Impact of p1106 inactivation on myeloid cells in 4T1 accumulation of splenic PMN-MDSCs and T;.g in wild-type or 6 mice. 


tumour-bearing mice. a, 4T1 primary tumour growth and lung metastasis in 
wild-type, 5°", Rag“ and Rag “ X87! mice. b, 4T1 tumour growth 
and total numbers of splenic CD11b* Gri™2" myeloid cells in wild-type and 
5°14 mice. c, Gating strategy used to identify myeloid cell subsets and 
frequency of splenic PMN-MDSCs and neutrophils of naive and 4T1 
tumour-bearing wild-type and 8°! mice. d, Spearman correlation between 


of treatment was reduced (Fig. 5c), correlating with higher levels of 
CD44"£" CD8* lymphocytes in the draining lymph nodes (Fig. 5d) and 
relatively higher levels of infiltrating CD8* T cells in pancreatic lesions 
14 days after treatment (Fig. 5e). These data indicate that therapeutic 
targeting of p1106 can promote immune-mediated elimination of cancer. 

Concerns have been raised about inhibiting p1106 in cancer as this might 
impair CTLs and negatively impact on cancer immune surveillance*"’. 
Our data show that although p1106 blockade reduces the effectiveness 
of CTLs, it also overrides T,eg- and probably also MDSC-mediated 
suppression of anti-tumour immune responses, enabling even weakened 
CTLs to successfully attack tumours. Thus, p1106 is apparently more 
essential for regulatory rather than effector T-cell responses against 
cancer cells. In addition, inhibition of the P1(3)K pathway in CD8 T cells 
may help maintain them in a stem-cell like state’? with enhanced poten- 
tial for generating durable anti-tumour responses. Consistent with this 
notion, 5°°!°“ mice resisted tumour rechallenge following surgical 
removal of the first tumour. The p1106 inhibitor Idelalisib has shown 
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e, Impact of depleting CD8* T cells in 8°91“ mice on 4T1 tumour burden and 
presence of splenic myeloid cell populations. f, Impact of purified splenic 
myeloid cells on proliferation of anti-CD3-stimulated wild-type T cells. 

g, Cytokine production by splenocytes from 4T1 tumour-bearing (30 days after 
inoculation) cells from wild-type or 8°?! mice, individually cultured for 

4 days. Statistics are as described in the legend to Fig. 2. 


impressive therapeutic impact in chronic lymphocytic leukaemia (CLL) 
and non-Hodgkin’s lymphoma’”. In CLL, p1108 blockade interferes 
with stroma-derived survival and adhesion signals supporting the tumour 
cells*, but it is unclear if this fully explains the effectiveness of p1106 
inhibition. Our finding that p1106 inhibition can unlock adaptive anti- 
tumour responses provides a potential additional mechanism for the 
efficacy of p1106 blockade in CLL, and adds to the emerging rationale 
for targeting PI(3)K in the tumour stroma’, to dampen inflammation 
(p110y)” and angiogenesis (p110«)'. 

Tumour-induced immune suppression constitutes an important bar- 
rier for effective anti-tumour immunity and immunotherapy in cancer. 
Our work suggests that p1106 inhibitors, by disrupting the function of 
Treg and possibly of MDSCs, have the potential to shift the balance from 
immune tolerance towards effective anti-tumour immunity. This pro- 
vides a rationale for p1106 inhibition both in solid and haematological 
cancers, possibly as an adjuvant to cancer vaccines, adoptive cell therapy, 
or other strategies that promote tumour-specific immune responses. 
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Figure 5 | Impact of pharmacological inactivation of p1105 on tumour 
growth and T-cell responses. a, Mice, dosed with vehicle or PI-3065 

(75 mgkg ', daily) for 36 days and inoculated with 10° 4T1 cells 12h post first 
dosing, were assessed for tumour growth by luciferase imaging (first panel), 
tumour weight (second panel) or luciferase activity in tumours excised 35 days 
after inoculation (third panel). Incidence of 4T1 metastasis (fourth panel), as 
detected by haematoxylin and eosin (H&E) staining and histology, expressed as 
percentage of the total number of tumour-bearing animals per group. b, Impact 
of PI-3065 (75 mgkg _') on KPC mouse survival (left) and macrometastases 
and cancer-associated pathology (right). c, Proportion of T, eg (percentage of 
CD4"*) in the draining lymph nodes of KPC mice administered vehicle or 
PI-3065. d, Proportion of CD44"" T cells (percentage of CD8*) in the 
draining lymph nodes of KPC mice administered vehicle or PI-3065. e, Relative 
numbers of CD8* T cells (percentage of CD45*) in normal pancreas and 
PDAC lesions of KPC mice treated or not with PI-3065. Statistics are as 
described in the legend to Fig. 2. 


METHODS SUMMARY 


All animal procedures were in compliance with institutional animal care and use 
committee guidelines. Details of procedures and reagents are described in Supplemen- 
tary Information. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Extended Data Figure 1 | Impact of p1106 inactivation on CD4 and CD8 from splenocytes from wild-type and 8°?!°“ OT-I mice, cultured in the 


T cells in mice with 4T1 or EL4 tumours. a, Levels of CD44"'8*CD4* and 
CD44" CD8* T cells in the indicated immune compartments of naive and 
4T1 tumour-bearing on day 26 after inoculation in wild-type or 8°! mice. 
b, Distribution of cells on day 5 of culture of splenocytes, isolated from 4T1 
tumour-bearing wild-type and 5°°!* mice 21 days after inoculation, in the 
presence of mitomycin-treated 4T1 cells. c, Gene expression in CTLs derived 


presence of SIINFEKL OVA peptide and IL-2. GzmA, granzyme A; GzmB, 
granzyme B, Prf1, perforin and (FasL or CD95L) Fas ligand. Expression 
levels are presented relative to B2-microglobulin. *P < 0.05, **P < 0.01, 
***P < 0.001 (non-parametric Mann-Whitney t-test). Numbers in brackets 
indicate the number of mice used per experiment. Each dot represents an 
individual mouse. 
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Extended Data Figure 2 | Impact of p1106 inactivation on myeloid cells in 
4T1 tumours. a, Gating strategy used to identify myeloid cell subsets. Splenic 
cells were gated on CD11b™®" cells followed by Ly6C and Ly6G gating. FSC, 
forward scatter; SSC, side scatter (top). Frequency of CD11b™ cells in the spleen 
of wild-type and 5°? naive mice and in 4T1 tumour-bearing mice on day 21 
after inoculation (bottom). b, [? H]-Thymidine incorporation in co-cultures 


of splenocytes and purified myeloid cells, in combinations as indicated, with or 
without stimulation with anti-CD3 antibodies. Cultures were made using cells 
derived from individual mice. Error bars represent standard deviation from 
the mean of biological replicates. *P < 0.05, **P < 0.01 (non-parametric 
Mann-Whitney t-test). Numbers in brackets indicate the number of mice 
used per experiment. Each dot represents an individual mouse. 
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Extended Data Figure 3 | Characterization of the p1105-selective inhibitor 
PI-3065. a, PI-3065 structure and in vitro ICs9 on selected PI3K family 
members. No significant activity against 72 protein kinases was observed 

at = 10 uM in a KinaseProfiler assay (Millipore). b, Pharmacokinetic 
parameters of PI-3065. Mean (= s.d.) plasma concentration profile of PI-3065 
following a single oral dose (75 mgkg ') administred per os (po) to female 
BALB/c mice. AUCinf, area under the curve, extrapolated to infinity; C,.ax, 
highest observed plasma concentration; ty,4,, time at which C,,,,, occurred, QD, 
quaque die (every day). c, Growth of primary 4T1 tumours, inoculated in the 
breast fat pad, measured by calipers and expressed as tumour volume. 

Mice were dosed per os with vehicle or PI-3065 (75 mg kg, daily) for 36 days. 
10° tumour cells were inoculated 12h post first dosing. d, Percentage of 
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tumour-free mice upon continuous per os treatment of mice with vehicle or 
PI-3065 (25 mgkg’ ', twice daily) for 37 days, with tumour cells inoculated on 
day 7 of PI-3065 dosing. 15 mice were used for each genotype. e, Class I PI3K 
isoform expression in 4T1 cells. f, Proliferation of 4T1 cells following a 

4-h treatment with the indicated p1106 inhibitors, washing and (3-(4,5- 
dimethylthiazol-2-yl)-5-(3-carboxymethoxy pheny])-2-(4-sulphophenyl)-2H- 
tetrazolium salt (MTS) staining after 48 h culture. g, Percentage body weight 
change (from day 0) of 4T1 tumour-bearing mice upon daily per os 
administration of PI-3065 (75 mgkg ') or vehicle for 36 consecutive days. 
*P < 0.05, **P < 0.01 (non-parametric Mann-Whitney t test). Numbers in 
brackets indicate the number of mice used per experiment. 
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Extended Data Table 1 | Comparison of PI-3065 with Idelalisib (formerly called GS-1101 or CAL-101) and IC87114 


Aly 
Compound 
pi1105 | p110a |} p110B | p110y | p1108 | p110a | p110B | p110y | expression 
IC50 (nM) 


prso6s | 15 | 110 | 130 5 | 910 | 600 | >10000 
Wdelalisib [14 [ 270 | 121 | 16 | 25 | 670 | 260 22 60 41 
ics7114_ | 34 =| >2100 | >2100 | 370 | 46 | >10000| >10000| 1300 3500 >7800 | 


Human whole blood was stimulated with anti-IlgM followed by FACS for CD69 as described®?3, Human B-cell lymphoma Ri-1 cells were pre-incubated for 30 min with vehicle or compound before stimulation with 
anti-IgM for 1h at 37 °C, followed by determination of Akt-Ser 473 phosphorylation, as described?*9, 


22. Murray, J. M. et al. Potent and highly selective benzimidazole 23. Safina, B.S. et al. Discovery of novel PI3-kinase 6 specific inhibitors for the 
inhibitors of PI3-kinase delta. J. Med. Chem. 55, 7686-7695 treatment of rheumatoid arthritis: taming CYP3A4 time-dependent inhibition. 
(2012). J. Med. Chem. 55, 5887-5900 (2012). 
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CFIm25 links alternative polyadenylation to 
glioblastoma tumour suppression 


Chioniso P. Masamha'*, Zheng Xia**, Jingxuan Yang’, Todd R. Albrecht', Min Li?, Ann-Bin Shyu', Wei Li? & Eric J. Wagner! 


The global shortening of messenger RNAs through alternative poly- 
adenylation (APA) that occurs during enhanced cellular prolifera- 
tion represents an important, yet poorly understood mechanism of 
regulated gene expression’”. The 3’ untranslated region (UTR) trun- 
cation of growth-promoting mRNA transcripts that relieves intrin- 
sic microRNA- and AU-rich-element-mediated repression has been 
observed to correlate with cellular transformation’; however, the im- 
portance to tumorigenicity of RNA 3’-end-processing factors that 
potentially govern APA is unknown. Here we identify CFIm25 as a 
broad repressor of proximal poly(A) site usage that, when depleted, 
increases cell proliferation. Applying a regression model on stand- 
ard RNA-sequencing data for novel APA events, we identified at 
least 1,450 genes with shortened 3’ UTRs after CFIm25 knockdown, 
representing 11% of significantly expressed mRNAs in human cells. 
Marked increases in the expression of several known oncogenes, in- 
cluding cyclin D1, are observed as a consequence of CFIm25 deple- 
tion. Importantly, we identified a subset of CFIm25-regulated APA 
genes with shortened 3’ UTRs in glioblastoma tumours that have 
reduced CFIm25 expression. Downregulation of CFIm25 express- 
ion in glioblastoma cells enhances their tumorigenic properties and 
increases tumour size, whereas CFIm25 overexpression reduces these 
properties and inhibits tumour growth. These findings identify a piv- 
otal role of CFIm25 in governing APA and reveal a previously un- 
known connection between CFIm25 and glioblastoma tumorigenicity. 

Recently, it has become increasingly clear that mRNA 3’-end for- 
mation is subject to dynamic regulation under diverse physiological 
conditions**. Over 50% of human genes have multiple polyadenyla- 
tion signals, thereby increasing the potential diversity in mRNA tran- 
script length’. The formation of mRNA transcripts using these distinct 
poly(A) sites (PASs) is carried out by APA, with the most common form 
involving differential use of alternative PASs located within the same 
terminal exon (reviewed in ref. 7). Processing at the PAS most prox- 
imal to the stop codon (pPAS) removes negative regulatory elements 
that reduce mRNA stability or impair translation efficiency, such as AU- 
rich elements (AREs)* and microRNA (miRNA) targeting sites’. It 
has been reported that both rapidly proliferating cells'* and transformed 
cells*"' preferentially express mRNAs with shortened 3’ UTRs. Despite 
these observations, the mechanisms that control the extensive distal- 
to-proximal PAS switch observed in proliferative and/or transformed 
cells, the relationship between cause and effect, and the critical target 
genes subject to this regulation, are not well characterized. 

To measure relative changes in endogenous APA events, we devised 
a quantitative polymerase chain reaction after reverse transcription (qRT- 
PCR) assay to monitor the transcript-specific use of the distal PAS (dPAS) 
while normalizing for total mRNA levels for three test transcripts, cyclin 
D1 (CCND1), DICER1 and TIMP2, known to undergo APA*"’, Using 
this approach, we readily detected appreciable usage of dPASs for all 
three genes in HeLa cells (Extended Data Fig. 1). This was somewhat 
surprising given their highly transformed state, but is consistent with 


previous reports that not all transformed cells tested exhibit apprecia- 
ble 3’ UTR shortening’. Previous studies implicate multiple members 
of the cleavage and polyadenylation (CPA) machinery as potentially 
regulating poly(A) site selection’*’. To test the relative contribution 
of these factors to the APA of the three test genes, we used systematic 
RNA interference (RNAi) (Fig. lac). We observed only small changes 
in the relative use of the dPAS after knockdown of members of the cleav- 
age and polyadenylation specificity factor (CPSF), cleavage stimulation 
factor (CSTF) and cleavage factor IIm (CFIIm) complexes (Fig. 1d-f). 
By contrast, we detected significant reduction in dPAS usage after knock- 
down of the members of the CFIm complex. These results are consistent 
with a recent report that CFIm68 depletion decreases 3’ UTR length"; 
however, the most notable PAS switching was found to occur after knock- 
down of CFIm25. We therefore focused all further analyses on CFIm25. 

Traditional methods of global PAS profiling use mRNA partitioning 
and digestion to sequence poly(A) junctions within messages’’*'’. To 
identify global targets of CFIm25 with a more streamlined approach re- 
quiring less sample manipulation, we performed high-depth (>3 X 10° 
reads) RNA sequencing (RNA-seq) after knocking down CFIm25 in par- 
allel with a control knockdown. We determined that 23% of RNA-seq reads 
can be uniquely mapped to 3’ UTRs of expressed genes leading to approxi- 
mately 200-fold sequence coverage (Extended Data Fig. 2a, b). We first 
analysed the three test genes and observed markedly reduced read den- 
sity within the 3’ UTRs in response to CFIm25 depletion (Fig. 2a). These 
results not only confirm our qRT-PCR findings that HeLa cells robustly 
use the dPAS for all three test genes under basal conditions but also dem- 
onstrate that considerable 3’ UTR shortening induced by CFIm25 knock- 
down is readily visualized by analysing the read density of RNA-seq data. 

On the basis of this promising observation, we applied a novel bio- 
informatics algorithm termed “dynamic analysis of alternative poly- 
adenylation from RNA-seq’ (DaPars; see Methods) for the de novo 
identification of all instances of 3’ UTR alterations between control and 
CFIm25 knockdown cells, regardless of a pre-annotated pPAS within 
each RefSeq transcript. DaPars uses a linear regression model to iden- 
tify the exact location of this novel proximal 3’ UTR as the optimal fit- 
ting point (Fig. 2b, red point) as well as the abundance of both novel 
and annotated UTRs. The degree of difference of 3’ UTR usage bet- 
ween the samples was then quantified as a change in percentage dPAS 
usage index (APDUI), which is capable of identifying lengthening (pos- 
itive index) or shortening (negative index) within the 3’ UTR. When 
applied to the 12,273 RefSeq transcripts whose average terminal exon 
sequence coverage is more than 30-fold, DaPars identified 1,453 tran- 
scripts possessing a significant, reproducible shift in 3’ UTR usage in 
response to CFIm25 depletion (Fig. 2c and Extended Data Fig. 2c, d). 
Notably, among this group of transcripts, 1,450 are shifted to pPAS usage 
in CFIm25 knockdown cells. We found a significant enrichment of the 
CFIm25 UGUA binding motif and previously reported CFIm25 iCLIP 
sequence tags“* within 3’ UTRs that shortened after CFIm25 knockdown 
relative to transcripts exhibiting no length change (Extended Data Fig. 3). 


1Department of Biochemistry and Molecular Biology, The University of Texas Medical School at Houston, Houston, Texas 77030, USA. “Division of Biostatistics, Dan L Duncan Cancer Center and 
Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, 77030 Texas, USA. The Vivian L. Smith Department of Neurosurgery, The University of Texas Medical School at Houston, 


Houston, Texas 77030, USA. 
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Figure 1 | CFIm25 depletion leads to consistent 
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Figure 2 | The DaPars algorithm identifies broad targets of CFIm25 in 
standard RNA-seq data. a, RNA-seq read density for 3’ UTR, terminal exon 
and upstream exon(s) after the control (Con.) siRNA treatment and CFIm25 
knockdown (KD) in HeLa cells. Numbers on y-axis indicate RNA-seq read 
coverage. b, Diagram depicts how the differential alternative 3’ UTR usage was 
identified based on DaPars. The y-axis shows the fitting value of the DaPars 
regression model and the locus with minimum fitting value (red point) is the 
predicted alternative pPAS for the RNA-seq data (bottom). ¢, Scatterplot of 
PDUIs in control and CFIm25 knockdown cells where mRNAs significantly 
shortened (# = 1,450) or lengthened (n = 3) after CFIm25 knockdown (false 
discovery rate (FDR) = 0.05, absolute APDUI = 0.2 and 


3’ UTR 


at least twofold change of PDUIs between CFIm25 knockdown and control 
cells) are coloured. The shifting towards pPAS is significant (P< 2.2 x 101%, 
binomial test). d, Correlation between dPAS site usage and gene expression 
levels of control and CFIm25 knockdown cells. The x-axis shows APDUI; a 
negative value indicates that pPAS is prone to be used in CFIm25 knockdown 
cells. The y-axis shows the logarithm of the expression level of genes from the 
CFIm25 knockdown relative to the control sample. e, Representative RNA-seq 
density plots along with APDUI values for genes whose 3’ UTR is shortened in 
response to CFIm25 knockdown. Numbers on y-axis indicate RNA-seq read 
coverage. f, Representative RNA-seq density plots along with APDUI values of 
genes whose 3’ UTR is unchanged by CFIm25 knockdown. 
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Moreover, we determined that 70% of transcripts whose 3’ UTR is shor- 
tened after CFIm25 knockdown use a pPAS within the first one-third 
of their 3’ UTR. By contrast, only 29% of multi-PAS transcripts that did 
not alter 3’ UTR length in response to CFIm25 have an annotated pPAS 
in the first third of their 3’ UTR. This demonstrates that CFIm25 APA 
targets are enriched with pPASs positioned close to the stop codon to max- 
imize their degree of 3’ UTR shortening. Collectively, these results clearly 
indicate that the function of CFIm25 is to broadly repress proximal poly(A) 
site choice, and consequently, the shortening of 3’ UTR length is consid- 
erable for the majority of CFIm25-regulated transcripts upon its depletion. 

One potential consequence of 3’ UTR shortening in CFIm25 knock- 
down is the loss of miRNA-binding sites and/or AREs, resulting in trun- 
cated mRNA transcripts that evade negative regulation. Although the 
correlation between transcript expression change and APDUI was mod- 
est (Pearson correlation = —0.25), it does reveal that transcripts with 
shorter 3’ UTR in CFIm25 knockdown cells have overall higher express- 
ion levels (Fig. 2d). We observed that 64% of transcripts with shortened 
3' UTRs exhibited significantly increased steady-state levels, 34% were 
unchanged, and only 2% were significantly reduced (Extended Data Fig. 4). 
We have also organized the list of CFIm25-regulated genes with respect 
to their APDUI score, change in relative levels of transcript, and predicted 
numbers of ARE motifs and miRNA target sites lost after APA (Sup- 
plementary Table 1) and observed that gene expression positively corre- 
lates with the number of lost ARE motifs and miRNA target sites (Extended 
Data Fig. 5). Several examples of novel genes whose APA is regulated 
by CFIm25 are shown in Fig. 2e and it is important to note that not all 
long 3’ UTRs were observed to shorten in response to CFIm25 knock- 
down, indicating that the CFIm complex regulates many, but not all genes 
capable of APA (Fig. 2f). Collectively, these data demonstrate the power 
and ease of the DaPars algorithm to identify APA within standard RNA- 
seq, and indicate that the major form of CFIm25 regulation is to repress 
pPAS choice at a global level. 

To validate the APDUI results, we created (RT-PCR amplicons to 
monitor dPAS usage of six genes whose 3’ UTRs were found to be short- 
ened after CFIm25 knockdown and two that were not altered. Using 
these amplicons, we analysed RNA isolated from cells effectively deple- 
ted of CFIm25 using two independent short interfering RNAs (siRNAs) 
(Fig. 3a, inset), and observed high congruence between qRT-PCR results 
and those obtained using RNA-seq and APDUI (Fig. 3a, graph). To 
test formally for the presence of de-repressed protein expression from 
mRNAs with shortened 3’ UTRs, we measured their levels in lysates 
from knockdown cells (Fig. 3b). We observed considerable increases 
in protein levels of CFIm25 target genes, including several that have a 
well-documented role in tumour growth, such as cyclin D1, glutami- 
nase and methyl-CpG-binding protein 2 (MECP2)'*™. It is worth not- 
ing that the 3’ UTR of each of these genes has been shown to be subject 
to miRNA-mediated inhibition’* *°. Consistent with this observation, 
we also noted enhanced cellular proliferation in response to knockdown 
of CFIm25 relative to control knockdown in HeLa cells (Fig. 3c). Finally, 
to determine whether the 3’ UTR is sufficient to elicit translational de- 
repression of a heterologous protein in response to CFIm25 knock- 
down, we used reporters with the SMOCI1 3’ UTR cloned downstream 
of luciferase or the GAPDH 3’ UTR, which was not found to alter its 
poly(A) site usage. We observed that only the luciferase activity spe- 
cifically resulting from the luciferase-SMOCI1 reporter was increased 
in response to knockdown of CFIm25 (Fig. 3d), supporting the idea that 
the increased expression of endogenous SMOCI protein when CFIm25 
is depleted is mediated through its 3’ UTR. 

The collective observations that CFIm25 depletion leads to broad 
3' UTR shortening, enhanced expression of growth promoting genes 
and increased cell proliferation support the hypothesis that CFIm25 is 
a novel anti-proliferative gene whose levels may be reduced in human 
cancers. We focused our analysis on glioblastoma, as recent reports indi- 
cate that brain tissue possesses the longest 3’ UTRs*°””. We reasoned that 
tumours derived from these cells might be more sensitive to changes in 
CFIm25 levels than other cancers. To test this prediction, we downloaded 
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Figure 3 | Increased pPAS usage after CFIm25 depletion results in 
increased protein translation and enhanced cell proliferation. a, RT-PCR 
results of select genes shown as fold change in dPAS usage after CFIm25 
depletion. Experiments were performed in triplicate with data shown as mean 
+ standard deviation from the mean (s.d.). The inset shows western blot 
analysis demonstrating effective knockdown of CFIm25 using two distinct 
siRNAs. Tub., tubulin. b, Results of western blot analysis of cell lysates after 
knockdown of CFIm25 using siRNA. c, Growth of HeLa cells was measured 
after RNAi of CFIm25 compared with cells transfected with control siRNA or 
the siRNA to the CFIIm complex subunit PCF11 (Unr.). Results shown are 
mean + standard deviation (s.d.) (n = 3). d, Graph representing luciferase 
activity from cells transfected with a luciferase reporter containing the 3’ UTR 
of either GAPDH or of SMOC1 after being transfected with either control 

or CFIm25 siRNA. Data are the average of three independent experiments 
and error bars show s.d. 


archived patient RNA-seq data from The Cancer Genome Atlas (TCGA), 
stratified it according to CFIm25 expression, and analysed it using DaPars. 
Indeed, following the same cut-offs in our HeLa RNA-seq 3’ UTRana- 
lysis, we identified 60 genes with altered 3’ UTRs, with 59 of those experi- 
encing shortening in glioblastoma expressing lower levels of CFIm25 
(Fig. 4a and Supplementary Table 2). Among those genes, a significant 
number of events (24 genes; P = 2.2 X 10” by hypergeometric test- 
ing) were also shortened in CFIm25 knockdown HeLa cells and this per- 
centage of overlap increased markedly to 86% as the APDUI cut-off 
increased from 0.2 to 0.4 (Extended Data Fig. 6). Two representative 
examples of genes, FOS-related antigen 2 (FRA2; also known as FOSL2) 
and MECP2, with shortened 3’ UTRs in low CFIm25-expressing glio- 
blastoma tumours is shown in Fig. 4b, demonstrating a compelling sim- 
ilarity between the patient samples and HeLa cells before and after 
CFIm25 knockdown. Overexpression of either of these genes has been 
shown to enhance cell proliferation’*”*. 

To test formally whether altering CFIm25 expression can modulate 
glioblastoma tumorigenic properties, we screened a panel of glioblas- 
toma cell lines and observed that U251 cells naturally express lower levels 
of CFIm25 compared with LN229 cells (Fig. 4c). To raise CFIm25 levels 
in U251 cells, we created cell lines stably expressing either Myc-tagged 
CFIm25 or green fluorescent protein (GFP) asa control. In parallel, we 
used RNAi to reduce CFIm25 levels in LN229 cells (Fig. 4c). We ob- 
served a significant reduction in anchorage-dependent growth and cel- 
lular invasion in U251 cells overexpressing CFIm25 compared with the 
GFP control, whereas reducing CFIm25 in LN229 cells caused an in- 
crease in both of these properties (Extended Data Fig. 7). To determine 
if the altered in vitro properties of glioblastoma cells affected tumour 
growth kinetics in vivo, we used a subcutaneous xenograft model. In- 
creased expression of CFIm25 in U251 cells resulted in a marked reduc- 
tion in tumour growth and decreased tumour cell proliferation (Fig. 4d 
and Extended Data Fig. 8). By contrast, depletion of CFIm25 in LN229 
cells caused a profound increase in tumour size (Fig. 4e and Extended 
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Figure 4 | Altered expression of CFIm25 modulates glioblastoma tumour 
growth. a, The global analysis of 3’ UTR changes in glioblastoma (GBM) 
patient samples with either high or low levels of CFIm25. Scatterplot of 
PDUIs from both data sets using the same cut-offs as in Fig. 2c. The shifting 
to pPAS in the low CFIm25 group is significant (P < 2.2 X 107"; binomial 
test). b, Representative UCSC Genome Browser images of RNA-seq data, 
demonstrating 3’ UTR shortening after CFIm25 knockdown in HeLa cells 
and in glioblastoma patient samples with high (blue) or low CFIm25 expression 
(red). KD, knockdown. c, Western blot analysis of lysates from two 
glioblastoma cell lines. Note that the overexpressed Myc—CFIm25 also 
increases endogenous CFIm25 levels in U251 cells. Tub., tubulin; Unt., 


Data Fig. 9). Collectively, these results uncover a tumour suppressive 
property of CFIm25 in glioblastoma that is probably mediated through 
its broad repression of APA-dependent mRNA 3’ UTR shortening. 

We identified CFIm25 among 15 cleavage and polyadenylation fac- 
tors as a key factor that broadly regulates APA. Importantly, the data 
presented here also extend our understanding of APA in regulated gene 
expression through the demonstration that extensive shortening of 
3’ UTRs causally leads to enhanced cellular proliferation and tumor- 
igenicity, probably through the upregulation of growth promoting 
factors, such as cyclin D1. These results indicate the importance of 3’ 
UTR usage in cell growth control and underscore the need for further 
research into the mechanism and regulation of APA and its potential 
links to other human diseases. 


METHODS SUMMARY 


Human cell lines used were cultured using standard techniques. RNAi and western 
blot experiments were conducted as described previously~’. For luciferase experi- 
ments, one day after the second siRNA hit, cells were transfected with 3’ UTR Renilla 
luciferase plasmids and activity was assayed after 24 h. Total RNA for pRT-PCR was 
reverse transcribed using MMLV-RT (Invitrogen). (RT-PCR reactions were per- 
formed using SYBRGREEN (Fermentas). Duplicate control and CFIm25 knock- 
down samples were sequenced by HiSeq 2000. RNA-seq reads were aligned (hg19) 
using TopHat 2.0.10°°. All the TCGA glioblastoma RNA-seq BAM files were down- 
loaded from the UCSC Cancer Genomics Hub (https://cghub.ucsc.edu/). DaPars 
was used to identify differential 3’ UTR usage from RNA-seq (Z.X. et al., unpub- 
lished observations; https://code.google.com/p/dapars). For tumour xenografts, U251 
cells were stably transfected with GFP or CFIm25 plasmids. LN229 cells were trans- 
fected with lentivirus expressing CFIm25 shRNA. After subcutaneous injection of 
cell lines into nude mice, glioblastoma tumour size was monitored and tumours 
were removed and histologically analysed. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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RNA-seq. We used whole transcriptome RNA-seq to investigate alternative PAS 
usage in a genome-wide fashion. Two control and two CFIm25 knockdown sam- 
ples were sequenced by HiSeq 2000 (LC Sciences). Paired-end RNA-seq reads with 
101 bp ineach end were aligned to the human genome (hg19) using TopHat 2.0.10°°. 
RefSeq gene expressions were quantified by RSEM”’. A statistical summary of read 
alignments and average gene expressions can be found in Extended Data Fig. 2. 
More than 12,000 (~50%) human RefSeq genes can be detected through RNA-seq 
with expression levels more than 1 fragments per kilobase of transcript sequence 
per million mapped paired-end reads (FPKM)*. More importantly, the average of 
23% of RNA-seq reads can be uniquely mapped to 3’ UTRs of expressed genes that 
renders around 200X coverage on UTRs. All the TCGA glioblastoma RNA-seq 
BAM files were downloaded from the UCSC Cancer Genomics Hub (CGHub; 
https://cghub.ucsc.edu/). 

Analysis of APA from RNA-seq. We used a novel bioinformatics algorithm DaPars 
(Z.X. et al., unpublished observations; https://code.google.com/p/dapars) for the 
de novo identification of APA from RNA-seq. The observed sequence coverage was 
represented as a linear combination of novel and annotated 3’ UTRs. For each RefSeq 
transcript with annotated PAS, we used a regression model to infer the end point of 
alternative novel PAS within this 3’ UTR at single nucleotide resolution, by min- 
imizing the deviation between the observed read coverage and the expected read 
coverage based ona two-PAS model, in both control and CFIm25 knockdown sam- 
ples simultaneously. 

To quantify the relative PAS usage, we defined the percentage of dPAS usage for 
each sample as PDUI index. The greater the PDUI is, the more the dPAS ofa tran- 
script is used and vice versa. 

APDUI. We used the following three criteria to detect the most significant shifted 
3' UTR events: First, given the expression levels of short and long 3’ UTRs in two 
samples in each condition, we compute the significance of the difference of mean 
PDUIs using Fisher’s exact test, which is further adjusted by Benjamini-Hochberg 
(BH) procedure to control the FDR at a level of 5%. Second, the absolute difference 
of mean PDUIs must be no less than 0.2. Third, the absolute log, ratio (fold change) 
of mean PDUIs must be no less than 1. To avoid false positive estimation on low 
coverage transcripts, we required that there be more than 30-fold coverage on the 
3' UTR region of both samples. For genes with multiple annotated PASs, we only 
kept the one with the greatest absolute APDUI value. Last, we identified 1,453 tran- 
scripts possessing a significant shift in 3’ UTR usage in response to CFIm25 knock- 
down, the vast majority of which have shortened 3’ UTRs in CFIm25 knockdown. 
Bioinformatic analyses of 3’ UTR shortening. As miRNA binding sites and other 
regulatory sequences such as AREs reside in 3’ UTRs****, APA has an important 
role in mRNA stability, translation and translocation. Indeed, it has been reported 
that shorter 3’ UTRs produce higher levels of protein’. To elucidate the consequences 
of 3’ UTR shortening, we provided the numbers of lost ARE motifs and miRNA 
binding sites due to the 3’ UTR shortening for the transcripts shifting to proximal 
3’ UTR usage in CFIm25 knockdown cells (Supplementary Table 1). The ARE is 
one of the most prominent cis-acting regulatory elements found in 3’ UTRs to target 
mRNAs for rapid degradation*’. The eight different consensus ARE motifs, includ- 
ing the plain AUUUA pentamer, were retrieved from the ARE site database”. 
miRNA-mRNA binding information was based on miRNA target prediction data- 
base TargetScanHuman version 6.2°***. To limit the miRNA to high-confidence 
sites, we required the probability of the preferentially conserved targeting (PCT) 
score to be more than 0 for all highly conserved miRNA families**. 
Differentially expressed gene expression analysis. With two replicates in each 
group, we used edgeR” to call differentially expressed genes with FDR < 0.05. To 
better quantify gene expression with shorter 3’ UTRs, we counted reads based on 
the coding regions of each transcript. 

Cell culture and cell counts. All the cell lines used (HeLa, U251 and LN229) were 
cultured in DMEM supplemented with 10% FBS (+1% penicillin and streptomy- 
cin) in a 5% COs incubator at 37°C. Cell counts were done using a standard 
hemacytometer. 

siRNA and western blot assays. Both siRNA transfection and western blot analysis 
were performed as previously described”’. The siRNA was purchased from Sigma 
and all the siRNAs used are shown below. After transfection, cells were harvested 
for mRNA extraction, western blotting or Matrigel assay. To detect 3'-end-processing 
factors by western blotting, the following primary antibodies from Bethyl Labor- 
atories were used: CPSF160, CPSF100, CPSF73, CPSF30, FIP1, CSTF77, CSTF64t, 
CSTF50, CFIm68 and CFIm59. Other antibodies used include CFIlm25 (PTGlabs), 
CSTF64 and CFIIm PCF11, and Symplekin (Sigma and CFIIm CLP1 (Epitomics). 
Additional antibodies include VMA21, GLS, ACER3 and GSK-3f (PTGlabs); cyclin 
D1 (Cell Signaling); and SMOC1 and tubulin (Abcam). 

siRNA sequences. We used the following siRNA sequences. CPSF160 sil: 5’-GC 
UUUAAGAAGGUCCCUCA  si2: 5’-CUUACCACGUGGAGUCUAA; CPSF100 
sil: 5‘-CUCAACUUCUUGAUCAGAU; si2: 5’-GGAUAGAUGGUGUCUUAG 
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A; CPSE73 sil: 5'-CCAUAUACUGGUCCCUUUA  si2: 5’-GAUAUUGGAAGU 
UCAGUCA; CPSF30 sil: 5‘-GUGCCUAUAUCUGUGAUUU ; si2: 5’-CCUAUA 
UCUGUGAUUUGAA; FIP] sil: 5’-CGAAUGGGACUUGAAGUUA,; si2: 5'-GA 
CAAGUACUGCCUCCAGA,; CSTF77 sil: 5’-GAAGACUUAUGAACGCCUU; 
si2 5'-CACAGAAUCAACCUAUAGA,; CSTF64 sil: 5’-GGCUUUAGUCCCGG 
GCAGA,; si2: 5'-GGUUAUGGCUUCUGUGAAU; CSTF64t sil: 5’-GUCUUAG 
AGACACGUGUAA, si2: 5’-CUAAUGUUCUGCUGAACCA; CSTF50 sil: 5’-G 
UCGUAAGUCCGUGCACCA,; si2: 5'-CUACUCUUCGCCUUUAUGA,; Symplekin 
sil: 5’-CAGUUCAACUCGGGCCUGA,; si2: 5’-GAGACAUUGAGUUGCUGCU; 
CFIm25 sil: 5’-CCUCUUACCAAUUAUACUU si2: 5’-GCUAUAUACAGUG 
UAGAAU; CFIm59 sil: 5’-CUCAUCUGCUCGUGUGGAU; si2: 5’-GCAAUU 
UCCAGCAGUGCCA; CFIm68 sil: 5’-CUGCAAUUUCUUUAAUUAA; si2: 5’- 
GGAUCAAGACGUGAACGAU; CFIIm CLP1 sil: 5'-GCUUAUGUCUCCAA 
GGACA,; si2: 5’'-CAGUUCAGUUGGAGUUGUU; CFIIm PCF11 sil: 5'-GUAC 
CUUAUGGAUUCUAUU  si2: 5’-GUAUCUCACUGCCUUUACVU) and the con- 
trol siRNA used was described elsewhere”. 

qRT-PCR. After appropriate transfections, total RNA was extracted using TRIzol 
Reagent (Life Technologies) using the manufacturer’s protocol. For RT-PCR the 
mRNA was reverse transcribed using MMLV-RT (Invitrogen) using the manufac- 
turer’s protocol to generate CDNA. The qRT-PCR reactions were performed using 
Stratagene MxPro3000P (Agilent Technologies) and SYBRGREEN (Fermentas). Com- 
mon primers were designed to target the open reading frame and normalize for total 
transcript. The distal primers were designed to target sequences just before the dPAS 
and detect long transcripts that use the dPAS. All primers used are shown below. 
Data were calculated using a modified version of the 2~“““" method to show changes 
in dPAS usage, where CT is the threshold cycle. First, the CT values for the common 
and distal amplicons were normalized to the levels of 7SK, where ACT (common or 
distal) = CT common or distal ~ CT 7sx- Then AACT = ACT gistay — ACT common (note 
that we applied the correction factor for difference in amplification efficiency cal- 
culated in Extended Data Fig. 1). To show fold changes normalized to the control 
siRNA-transfected samples the following equation was used: normalized AAACT 
= AACTaverage target siRNA AACTaverage of control siRNA: Then the decrease ) or 
increase (+) in dPAS usage was calculated as =f gnormalized AAACT 
Oligonucleotides used for qRT-PCR. Cyclin D1 common forward, 5’-CTGC 
CAGGAGCAGATCGAAG; reverse, 5’-AATGCTCCGGAGAGGAGGGACT; 
distal forward, 5'-ATCGAGAGGCCAAAGGCT; reverse, 5'-CGTCTTTTTGTC 
TTCTGCTGGA; DICER1 common forward, 5'-CTCATTATGACTTGCTATGT 
CGCCTTG; reverse, 5'-CACAATCTCACATGGCTGAGAAG; distal forward 5’- 
TGCTTTCCGCAGTCCTAACTATG; reverse, 5’-AATGCCACAGACAAAAAT 
GACC; TIMP2 common forward, 5’-CAACCCTATCAAGAGGATCCAGTAT; 
reverse, 5'-GATGTCGAGAAACTCCTGCTTG; distal forward, 5'’-GACATCA 
GCTGTAATCATTCCTGTG; reverse, 5’-CGATGCCAAATGGAGAGC; FHL1 
common forward, 5’-CTGGCACAAAGACTGCTTCACCTGT,; reverse 5’-GAT 
TGTCCTTCATAGGCCACCACACTGG; distal forward, 5’-GCCAGGGCTGT 
CATCAACATGGATA,; reverse 5’-TGCATTTCAGGTAAGCGGTAGGTGGA; 
tubulin common forward, 5’- GAAGGCCTCATCCTCCACTTTGGAAAG; reverse, 
5'-TGCTAGCAGTGTCTCATGCTCG; distal forward, 5’-GCATCAGTAGCTG 
AGTGCACTCCTGGT; reverse, 5’-GTAGAGGGTATGAAGGGCAAGAACTCT; 
VMA21 common forward, 5'- GATAAGGCGGCGCTGAACGCACTGC; reverse, 
5'-TGAGCCTTCATTCCAGGCCACATACACA,; distal forward, 5’-CATCTGC 
ACAGCACCTTACAGTTTGG; reverse, 5’-GAAATGCAGCACATCCAAATC 
CTCCC; GSK-38 common forward, 5’-CTGGTCCGAGGAGAACCCAATGTT 
TCG; reverse, 5’'-CAGCCAACACACAGCCAGCAGACCATAG; distal forward, 
5'-GAGCTGAGCCCATGGTTGTGTGTAAG; reverse, 5’-GGTTCACTTCAG 
CAGGCAGGACAACTC; SMOCI1 common forward, 5'-CTCTGATGGCAGGT 
CCTACGAGTCCA,; reverse, 5'-GTATGGCACTGCACCTGGGTAAAGGAG; 
distal forward, 5'-GAGTCCTGCAATTGTACTGCGGACTCCA; reverse 5’-CA 
TGGGATCTGGACTCCCTTCCTCTC; ACER3 common forward, 5’-CACGCT 
GGACTGGTGCGAGGAGAACT; reverse, 5’-GTGGAAGCACCAGGATCCCA 
TTICCTACG; distal forward, 5’-CTGTTCAAGCTAATACAGCATTTCCT; reverse, 
5'-GTGAATAAGCAGACTGAGATTACCTG; TMEM48 common forward, 5’- 
CATTCATCCTCAGCAACTCATGCACTG; reverse 5’-CTGTTAGTACCAGT 
GCAGGGAACCAC; distal forward, 5'-GTGCTGTGTACTAAATACAGGCCA 
CATAGTG; reverse 5’-CCTGGTTCCAACAGATGGTGTGTAGA; MSRB3 com- 
mon forward, 5'-CTCTGGGAAGTGCGCAGTCCGGGT; reverse, 5’-GTCCCTT 
TCTCCTGAGTGACATGG- distal forward, 5’-GCAGGATATGGAGTGCAATG 
AACTGAG; reverse, 5’-ACAGTAAGAGCTGGAGTGCAGAGA; 7SK forward, 
5'-GACATCTGTCACCCCATTGATC; reverse, 5’-TCTGCAGTCTTGGAAGC 
TTGAC. 

Luciferase assays. One day after a second hit with siRNA (as described earlier), 
HeLa cells were transfected with 0.25 1g of gene-specific 3’ UTR Renilla luciferase 
plasmids (SMOC1 and GAPDH from Switchgear Genomics) using Lipofectamine 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


2000 (Invitrogen). Renilla luciferase activity was assayed 24h after plasmid trans- 
fection using Stop and Glo reagent (Promega). 

Generation of stable cell lines. LN229 cells were transfected with CFIm25-specific 
shRNA or control shRNA using polybrene in 6-well plates. Two days after lenti- 
viral transfection cells were transfected with a second hit of lentivirus. Selection was 
done using 1 pg ml’ of puromycin over 2 weeks. U251 cells were transfected with 
either GFP or CFIm25 expressing pcDNA3 plasmids using Lipofectamine 2000 (Invi- 
trogen) according to the manufacturer’s protocol. Selection was performed over 1-2 
weeks using 2.5 mg ml of G418. 

Soft agar assay. Soft agar assays were used to determine anchorage-dependent 
growth. For the base layer, 1% of UltraPure low melting point agarose (Invitrogen) 
was mixed 1:1 with 2x DMEM media and plated in 6-well plates giving a 1.5 ml 
bottom layer of 0.5% agar. Then 3 X 10* cells of LN229 shRNA stably transfected 
cells were titrated into 2x DMEM and mixed with an equal volume of 0.6% agar to 
give a 0.3% layer and 1.5 ml was dispensed into each well. The agar was covered with 
1 ml of 1X DMEM and incubated in a humidified incubator at 37 °C (5% CO). 
Fresh media was added once a week. After 2 weeks, colonies formed were stained 
with 0.01% crystal violet, photographed and counted. For U251 plasmid transfected 
cells the same protocol was followed except that a third (0.3%) layer of agar was 
plated on top of the layer containing the cell suspension. 

Matrigel invasion assay. The Matrigel invasion assay was performed following 
the manufacturer’s protocol. Briefly, the 6-well BioCoat Matrigel Invasion Chamber 
(Becton Dickinson) was rehydrated with FBS free DMEM. The Matrigel trans-well 
inserts were then transferred to 6-well plates containing 10% FBS on the bottom. 
U251 siRNA-transfected or LN229 shRNA-transfected cells were plated (5 X 10° 
cells per well) in triplicate wells of the upper chamber in serum-free media. After 24h, 
cells were stained with 0.01% crystal violet, and the number of invading cells was 
counted at X20 magnification in 10 fields for each well. 

Statistical tests. Unless otherwise specified, experiments were done using three 
biological replicates and data are shown as average + s.d., and statistical analysis was 
done using a two-tailed student t-test. 

Subcutaneous xenograft tumour model. Hsd:Athymic Nude-Foxn1nu nude mice 
at age 5-6 weeks were used. For each cell line (LN229 or U251), 20 male nude mice 


were randomly assigned into two groups (n = 10). Stably transfected LN229 and 
U251 cells were resuspended in pure culture medium with the concentration of 
3 X 10’ cells ml !. One-hundred-microlitre cell suspensions (3 X 10° cells) were 
inoculated subcutaneously into the lower right flank of the mice using a 27-gauge 
needle. Tumour diameters are measured with digital callipers, and the tumour volume 
inmm* is calculated by the formula: volume = (width)? x length/2. The tumour size 
data were collected and processed blindly. The animal experiments were performed 
under the Institutional Review Board approved animal protocol AWC-13-115. 
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Extended Data Figure 1 | Design and optimization of the qRT-PCR assay to 
monitor APA of three test genes. a, Schematic denotes the relative location of 
the common and distal primer annealing sites in each test gene and the 
approximate locations of the annotated proximal and distal poly(A) sites, 
depicted as pPAS and dPAS, respectively. The numbers demarcate where the 
3’ UTR starts and ends according to ENSEMBL. b, Ethidium-stained agarose 
gel of RT-PCR products of equal cycle number from the different amplicons 
using HeLa cell mRNA. c, Both the common and distal cyclin D1 amplicons 
were cloned into the same pcDNA3 plasmid in tandem. Three dilutions of 
each plasmid were made and amplified individually with each amplicon in 


Avg. common CT= 13.95 
Avg. distal CT= 14.88 


triplicate. The two lines on the graph depict the amplification curve for the 
common and distal amplicons. The expectation is that identical cycle threshold 
(CT) values should be attained for each, given that the PCR reactions were 
conducted using identical amounts of starting material. The average of three 
individual experiments is shown for each dilution and the average CT deviation 
of either amplicon at all of the dilutions was calculated as a correction factor. 
d, The experiment shown in c was repeated for DICER1 and TIMP2 to 
determine their respective correction factors, which was then applied to 
experiments shown in Fig. 1. 
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Extended Data Figure 2 | Summary of RNA-seq alignment and 
reproducibility of PDUI and CFIm25-depletion-induced 3’ UTR 
shortening. a, RNA-seq read statistics of the four biologically independent 
experiments where HeLa cells were treated with either control siRNA (Control) 
or CFIm25 siRNA (CFIm25kD). Pie chart on the right represents genomic 
distribution of reads that were mapped to human genome hg19. The percentage 
was calculated by averaging all samples. CDS, coding region. b, Histogram of 
gene expression of RefSeq genes with fragments per kilobase of transcript 
sequence per million mapped paired-end reads (FPKM) no less than 1. 


c, Scatterplot of the two biological replicates for each condition with high 
Pearson correlation (r= 0.9) demonstrating a high level of reproducibility 
between sample PDUI scores. Each dot represents the PDUI 

of a transcript. d, Genome browser screen images from four independent 
RNA-seq experiments. Each represents an independent biological sample 
where HeLa cells were transfected with either the control siRNA (Con.) or 
an siRNA that knocked down CFIm25. Both VMA21 and SPCS3 were found to 
undergo 3’ UTR shortening after CFIm25 knockdown whereas FHL1 was 
found not to change. 
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Extended Data Figure 3 | Shortened transcripts have more UGUA CFIm25- 
binding motifs than unaltered transcripts. a, CFlm25 is known to bind to the 
UGUA motif. The number of UGUA motifs within the 3’ UTRs of genes with 
3' UTR shortening after CFIm25 knockdown relative to genes with unaltered 
3' UTRs was calculated and compared. Here we selected the genes without 


\/|3'UTR Usage without Change 


p-value<2.2e-16 by t-test 


LETTER 


p-value 6.1e-107 


800 


600 


400 
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CFlm25 iClip reads density within 3'UTR 


0 — 


T T 
All Genes without Genes with 
differential APAs differential APAs 


3’ UTR change according to them having a APDUI value = 0.05. b, iCLIP tags 
from ref. 14 (Gene Expression Omnibus accession number GSE37398) were 
superimposed onto data derived from PDUI analysis of CFIm25 knockdown 
cells. The box plot demonstrates the enrichment of CFIm25 binding within 
3’ UTRs that are altered after CFIm25 knockdown (P = 6.1 X 10 1%, t-test). 
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Extended Data Figure 4 | Gene expression changes of genes with shortened _ expressed gene analysis was performed using edgeR with FDR < 0.05 (see 
3’ UTRs. Pie chart was calculated from the list of 1,450 genes exhibiting Methods). 
shortened 3’ UTRs due to CFIm25 knockdown (dn, down). Differentially 
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Gene expression fold change vs b Gene expression fold change vs 
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correlation 0.28 
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Extended Data Figure 5 | The Pearson correlation between gene expression _ expression levels after CFIm25 knockdown (KD). Right, similar to the left 
fold change and the number of lost negative regulatory elements. Left, the | except the number of lost patented miRNA target sites (Targetscan 6.2) was 
number of lost AREs (AU-rich elements) due to 3’ UTR shortening was plotted. 

calculated using the ARE database and plotted against change in gene 
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glioblastoma with low CFIm25 and shortening events in HeLa cells after CFIm25 knockdown. Right, y-axis (blue) shows the number of shortening 
CFIm25 knockdown. Left, y-axis (red) represents the percentage ofshortening _ events in low CFIm25 glioblastoma (GBM) against different APDUI cut-offs. 
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Extended Data Figure 7 | Overexpression of CFIm25 reduces invasion and 
colony formation whereas CFIm25 depletion increases invasion and colony 
formation. a, U251 cells were transfected with either GFP or CFIm25. Top left, 
Cells were replated in soft agar and the number of colonies/clusters formed 

were determined. Bottom left, Matrigel invasion assay for cells overexpressing 
CFIm25 or GEP. b, Top right, LN229 cells were transfected with either control 
or two different lentiviral plasmids targeting CFIm25 (KD1 and KD2). Stably 
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transfected cells were plated on soft agar and the resulting colonies were 
counted for KD1 and KD2, respectively. Bottom right, LN229 cells were 
transfected with either control or two different siRNAs (KD1 and KD2) 
directed against CFIm25 and were replated for a Matrigel invasion assay. All the 
experiments were done in biological triplicates and shown is the mean + s.d. All 
P values were from the two-tailed student t-test of the control versus sample. 
*P<0.1, **P<0.01, ***P <0.001. 
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tumours were isolated from nude mice on day 84 after implantation and lentivirus that overexpresses CFIm25. 
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Extended Data Figure 9 | Reduction in CFIm25 expression levels enhances measured for volume (a) and weight (b) (n = 10). LN229-shCon. indicates 
LN229 tumour size and weight. a, b, LN229 subcutaneous (s.c.) xenograft control lentiviral transduced cells and LN229-shCFIm25 indicates cells 
tumours were isolated from nude mice on day 40 after implantation and transduced with a lentivirus that expresses shRNA targeting CFIm25. 
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Persistent gut microbiota immaturity in 
malnourished Bangladeshi children 
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Qunyuan Zhang“, Michael A. Province*, William A. Petri Jr°, Tahmeed Ahmed? & Jeffrey I. Gordon! 


Therapeutic food interventions have reduced mortality in children 
with severe acute malnutrition (SAM), but incomplete restoration of 
healthy growth remains a major problem’”. The relationships between 
the type of nutritional intervention, the gut microbiota, and thera- 
peutic responses are unclear. In the current study, bacterial species 
whose proportional representation define a healthy gut microbiota 
as it assembles during the first two postnatal years were identified by 
applying a machine-learning-based approach to 16S ribosomal RNA 
data sets generated from monthly faecal samples obtained from birth 
onwards in a cohort of children living in an urban slum of Dhaka, 
Bangladesh, who exhibited consistently healthy growth. These age- 
discriminatory bacterial species were incorporated into a model that 
computes a ‘relative microbiota maturity index and ‘microbiota- 
for-age Z-score’ that compare postnatal assembly (defined here as 
maturation) ofa child’s faecal microbiota relative to healthy children 
of similar chronologic age. The model was applied to twins and triplets 
(to test for associations of these indices with genetic and environmental 
factors, including diarrhoea), children with SAM enrolled in a rando- 
mized trial of two food interventions, and children with moderate 
acute malnutrition. Our results indicate that SAM is associated with 
significant relative microbiota immaturity that is only partially 
ameliorated following two widely used nutritional interventions. 
Immaturity is also evident in less severe forms of malnutrition and 
correlates with anthropometric measurements. Microbiota maturity 
indices provide a microbial measure of human postnatal development, 
a way of classifying malnourished states, and a parameter for judging 
therapeutic efficacy. More prolonged interventions with existing or 
new therapeutic foods and/or addition of gut microbes may be needed 
to achieve enduring repair of gut microbiota immaturity in childhood 
malnutrition and improve clinical outcomes. 

Severe acute malnutrition and moderate acute malnutrition (MAM) 
are typically defined by anthropometric measurements: children are 
classified as having SAM if their weight-for-height Z-scores (WHZ)’ 
are below three standard deviations (—3 s.d.) from the median of the 
World Health Organization (WHO) reference growth standards, whereas 
those with WHZ between —2 and —3 s.d. are categorized as having MAM. 
SAM and MAM typically develop between 3 and 24 months after birth *. 
A standardized treatment protocol for SAM and its complications has 
been developed in Bangladesh’. The result has been a reduction in mor- 
tality rate, although the extent to which this protocol results in long-term 
restoration of normal growth and development needs to be ascertained 
through longitudinal studies*°. There is similar lack of clarity about the 
long-term efficacy of nutritional interventions for MAM”*. 

Food is a major factor that shapes the proportional representation of 
organisms present in the gut microbial community (microbiota), and 
its gene content (microbiome). The microbiota and microbiome in 
turn have an important role in extracting and metabolizing dietary 
ingredients” *. To investigate the hypothesis that healthy postnatal 


development (maturation) of the gut microbiota is perturbed in mal- 
nutrition’, we monitored 50 healthy Bangladeshi children monthly 
during the first 2 years after birth (25 singletons, 11 twin pairs, 1 set 
of triplets; 996 faecal samples collected monthly; see Methods and 
Supplementary Tables 1-3). By identifying bacterial taxa that discrim- 
inate the microbiota of healthy children at different chronologic ages, 
we were able to test our hypothesis by studying 6 to 20-month-old chil- 
dren presenting with SAM, just before, during, and after treatment 
with two very different types of food intervention, as well as children 
with MAM. The results provide a different perspective about malnutri- 
tion; one involving disruption of a microbial facet of our normal 
human postnatal development. 

To characterize gut microbiota maturation across unrelated healthy 
Bangladeshi children living in separate households, faecal samples were 
collected at monthly intervals up to 23.4 + 0.5 months of age in a train- 
ing set of 12 children who exhibited consistently healthy anthropometric 
scores (WHZ, —0.32 + 0.98 (mean + s.d.) 22.7 + 1.5 faecal samples per 
child; Supplementary Table 4a). The bacterial component of their fae- 
cal microbiota samples was characterized by V4-16S rRNA sequencing 
(Supplementary Table 5) and assigning the resulting reads to operational 
taxonomic units (OTUs) sharing = 97% nucleotide sequence identity 
(see Methods; a 97%-identity OTU is commonly construed as repre- 
senting a species-level taxon). The relative abundances of 1,222 97%- 
identity OTUs that passed our filtering criterion’ were regressed against 
the chronologic age of each child at the time of faecal sample collection 
using the Random Forests machine learning algorithm’*. The regression 
explained 73% of the variance related to chronologic age. The signifi- 
cance of the fit was established by comparing fitted to null models in 
which age labels of samples were randomly permuted with respect to 
their 16S rRNA microbiota profiles (P = 0.0001, 9,999 permutations). 
Ranked lists of all bacterial taxa, in order of ‘age-discriminatory import- 
ance’, were determined by considering those taxa, whose relative abund- 
ance values when permuted have a larger marginal increase in mean 
squared error, to be more important (see Methods). Tenfold cross-validation 
was used to estimate age-discriminatory performance as a function of 
the number of top-ranking taxa according to their feature importance 
scores. Minimal improvement in predictive performance was observed 
when including taxa beyond the top 24 (see Supplementary Table 6 for 
the top 60). The 24 most age-discriminatory taxa identified by Random 
Forests are shown in Fig. 1a in rank order of their contribution to the 
predictive accuracy of the model and were selected as inputs to a sparse 
24-taxon model. 

To test the extent to which this sparse model could be applied, we 
applied it, with no further parameter optimization, to additional monthly 
faecal samples collected from two other healthy groups of children: 13 
singletons (WHZ, —0.4 + 0.8 (mean = s.d.)) and 25 children from a birth- 
cohort study of twins and triplets, (WHZ, —0.5 + 0.7 (mean + s.d.)), all 
born and raised in Mirpur, Bangladesh (Supplementary Table 4b, c). We 
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Figure 1 | Bacterial taxonomic biomarkers for defining gut-microbiota 
maturation in healthy Bangladeshi children during the first 2 years of life. 
a, Twenty-four age-discriminatory bacterial taxa were identified by applying 
Random Forests regression of their relative abundances in faecal samples 
against chronologic age in 12 healthy children (n = 272 faecal samples). Shown 
are 97%-identity OTUs with their deepest level of confident taxonomic 
annotation (also see Supplementary Table 6), ranked in descending order of 
their importance to the accuracy of the model. Importance was determined 
based on the percentage increase in mean-squared error of microbiota age 
prediction when the relative abundance values of each taxon were randomly 
permuted (mean importance + s.d., n = 100 replicates). The insert shows 
tenfold cross-validation error as a function of the number of input 97%-identity 
OTUs used to regress against the chronologic age of children in the training 
set, in order of variable importance (blue line). b, Microbiota age predictions in 


found that the model could be applied to both groups (7° = 0.71 and 0.68, 
respectively), supporting the consistency of the observed taxonomic sig- 
nature of microbiota maturation across different healthy children liv- 
ing in this geographic locale (Fig. 1b, c). 

Two metrics of microbiota maturation were defined by applying the 
sparse model to the 13 healthy singletons and 25 members of twin pairs 
and triplets that had been used for model validation. The first metric, 
relative microbiota maturity, was calculated as follows: 


relative microbiota maturity = microbiota age of child 


— microbiota age of healthy children of similar chronologic age 


where microbiota age values for healthy children were interpolated across 
the first 2 years of life using a spline fit (Fig. 1b). The second metric, 
microbiota-for-age Z score, was calculated as follows: 

MAZ= 


(microbiota age — median microbiota age of healthy children of same chronologic age) 


(s.d. of microbiota age of healthy children of the same chronologic age) 


where MAZ is the microbiota-for-age Z-score, and median and s.d. of 
microbiota age were computed for each month up to 24 months. The 
MAZ accounts for the variance of predictions of microbiota age as a 
function of different host age ranges (when considered in discrete monthly 
bins) (see Extended Data Fig. 1 for the calculation of each metric, and 
Supplementary Notes for discussion of how this approach defines imma- 
turity as a specific recognizable state rather than as a lack of maturity). 
To study the influences of genetic and environmental factors on these 
microbiota maturation indices, we examined their distribution in healthy 
Bangladeshi twins and triplets. Monozygotic twins were not significantly 
more correlated in their maturity profiles compared to dizygotic twins, 
and within the set of triplets, the two monozygotic siblings were not more 
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a birth cohort of healthy singletons used to train the 24 bacterial taxa model 
(brown, each circle represents an individual faecal sample). The trained 
model was subsequently applied to two sets of healthy children: 13 singletons 
set aside for model testing (green circles, n = 276 faecal samples) and 
another birth cohort of 25 twins and triplets (blue circles, n = 448 faecal 
samples). The curve is a smoothed spline fit between microbiota age and 
chronologic age in the validation sets (right two panels of b), accounting for the 
observed sigmoidal relationship (see Methods). c, Heatmap of mean relative 
abundances of the 24 age-predictive bacterial taxa plotted against the 
chronologic age of healthy singletons used to train the Random Forests model, 
and correspondingly in the healthy singletons, and twins and triplets used to 
validate the model (hierarchical clustering performed using the Spearman rank 
correlation distance metric). 


correlated than their fraternal sibling (monozygotic pairs, 0.1 + 0.5 (Spear- 
man’s Rho = s.d.); dizygotic pairs, 0.33 + 0.3; in the case of the triplets, 
values for the monozygotic pair and fraternal sibling were 0.1; and 
0.24 + 0.3, respectively). Maturity was significantly decreased in faecal 
samples obtained during and 1 month after diarrhoeal episodes (P < 
0.001 and P < 0.01, respectively) but not beyond that period (Extended 
Data Fig. 2). There was no discernable effect of recent antibiotic usage 
(1 week before sampling) on relative microbiota maturity, whereas intake 
of infant formula was associated with significantly higher maturity values 
(Supplementary Table 7). Family membership explained 29% of the total 
variance in relative microbiota maturity measurements (log-likelihood 
ratio = 102.1, P< 0.0001; linear mixed model) (see Supplementary Notes, 
Supplementary Tables 8 and 9, and Extended Data Fig. 3 for analyses of 
faecal microbiota variation in mother-infant dyads and fathers). 

To investigate the effects of SAM on microbiota maturity, 64 children 
with SAM who had been admitted to the Nutritional Rehabilitation Unit 
of the International Centre for Diarrhoeal Disease Research, Bangladesh 
(ICDDR,B), Dhaka Hospital, were enrolled in a study to investigate the 
configuration of their faecal microbiota before, during and after treatment 
with either an imported, internationally used ready-to-use therapeutic 
food (RUTF; Plumpy’Nut) or a locally produced, lower-cost nutritional 
food combination (Khichuri-Halwa). Children ranged in age from 6 to 20 
months of age at the time of enrollment and were randomly assigned to 
either of the treatment arms. At enrollment, WHZ averaged —4.2 + 0.7 
(mean = s.d.) (see Supplementary Tables 10-12 for patient metadata and 
Fig. 2a for study design). In the initial ‘acute phase’ of treatment, infection 
control was achieved with parenteral administration of ampicillin and 
gentamicin for 2 and 7 days, respectively, and oral amoxicillin for 5 days 
(from days 3 to 7 of the antibiotic treatment protocol). Children with 
SAM were initially stabilized by being fed the milk-based gruel, ‘suji’, 


©2014 Macmillan Publishers Limited. All rights reserved 


a n= 31 children 


Acute phase of SAM 


n= 31 children 


Khichuri-Halwa 


c d e f 
© Prior to RUTF « End of RUTF 
« Prior to Khichuri-Halwa © End of Khichuri-Halwa 


24. 24 24 


Microbiota age (months) 


CC ey 


Chronological age (months) 


Figure 2 | Persistent immaturity of the gut microbiota in children with 
SAM. a, Design of the randomized interventional trial. b, Microbiota maturity 
defined during various phases of treatment and follow-up in children with 
SAM. Relative microbiota maturity in the upper portion of the panel is based on 
the difference between calculated microbiota age (Random-Forests-based 
sparse 24-taxon model) and values calculated in healthy children of similar 
chronologic age, as interpolated over the first 2 years of life using a spline curve. 
In the lower portion of the panel, maturity is expressed as a microbiota-for-age 
Z-score (MAZ). Mean values = s.e.m. are plotted. The significance of 
differences between microbiota indices at various stages of the clinical trial is 


followed by randomization to either an imported peanut-based RUTF 
intervention or an intervention with locally produced, rice-and-lentil- 
based therapeutic foods (Khichuri and Halwa; see Methods and Supplemen- 
tary Table 13 for compositions of all foods used during nutritional 
rehabilitation). During this second ‘nutritional rehabilitation phase’ 
(1.3 + 0.7 weeks long) children received 150-250 kcalkg_' body weight 
per day of RUTF or Khichuri-Halwa (3-5 g protein kg ' per day), plus 
micronutrients including iron. Children were discharged from the 
hospital after the completion of this second phase; during the ‘post- 
intervention phase’, periodic follow-up examinations were performed 
to monitor health status. Faecal samples were obtained during the acute 
phase before treatment with Khichuri-Halwa or RUTF, then every 3 
days during the nutritional rehabilitation phase, and monthly there- 
after during the post-intervention follow-up period. 

There was no significant difference in the rate of weight gain between the 
RUTE and Khichuri-Halwa groups (10.9 + 4.6 versus 10.4 + 5.4 g kg * 
body weight per day (mean + s.d.); Student’s t-test, P = 0.7). The mean 
WHZ at the completion of nutritional rehabilitation was significantly 
improved in both treatment groups (—3.1 + 0.7 (mean + s.d.) RUTF, P< 
0.001; and —2.7 + 1.6 Khichuri-Halwa, P < 0.0001), but not significantly 
different between groups (P = 0.15). During follow-up, WHZ remained 
significantly lower compared to healthy children (—2.1 + 1.2, Khichuri- 
Halwa; —2.4 + 0.8 RUTF versus —0.5 + 1.1 for healthy, P< 0.0001; Extended 
Data Fig. 4a). Children in both treatment arms also remained markedly 
below normal height and severely underweight throughout the follow- 
up period (Extended Data Fig. 4b, c). 

The Random Forests model derived from healthy children was used 
to define relative microbiota maturity for children with SAM at the time 
of enrollment, during treatment, at the end of either nutritional inter- 
vention, and during the months of follow-up. The results revealed that 
compared to healthy children, children with SAM had significant micro- 
biota immaturity at the time that nutritional rehabilitation was initiated 
and at cessation of treatment (Dunnett’s post-hoc test, P< 0.0001 for 
both groups; Fig. 2b). Within 1 month of follow-up, both groups had 
improved significantly. However, improvement in this metric was short- 
lived for the RUTF and Khichuri-Halwa groups, with regression to sig- 
nificant immaturity relative to healthy children beyond 4 months after 
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indicated relative to healthy controls (arrows above the bars) and versus 
samples collected at enrollment for each intervention group (arrows below the 
bars) (post-hoc Dunnett’s multiple comparison procedure of linear mixed 
models; *P < 0.05, **P < 0.01, ***P < 0.001). Healthy children not used to 
train the Random Forests model served as healthy controls (n = 38). c-f, Plot 
of microbiota age for each child with SAM at enrollment (c), at the conclusion 
of the food intervention phase (d), and within (e) and beyond (f) 3 months of 
follow-up. The curve shown in each panel was fit using predictions in healthy 
children: this curve is the same as that replicated across each plot in Fig. 1b. 


treatment was stopped (Fig. 2b and Supplementary Table 14). MAZ, like 
relative microbiota maturity, indicated a transient improvement after 
RUTF intervention that was not durable beyond 4 months. In the Khichuri- 
Halwa group, relative microbiota maturity and MAZ improved following 
treatment, but subsequently regressed, exhibiting significant differences 
relative to healthy children at 2-3 months, and >4 months after ces- 
sation of treatment (Fig. 2b and Supplementary Table 14). 

Both food interventions had non-durable effects on other micro- 
biota parameters. The reduced bacterial diversity associated with SAM 
persisted after Khichuri-Halwa and only transiently improved with 
RUTF (Extended Data Fig. 5 and Supplementary Table 14). We iden- 
tified a total of 220 bacterial taxa that were significantly different in 
their proportional representation in the faecal microbiota of children 
with SAM compared to healthy children; 165 of these 220 97%-identity 
OTUs were significantly diminished in the microbiota of children with 
SAM during the longer term follow-up period in both treatment groups 
(Extended Data Figs 6 and 7, and Supplementary Table 15). 

Although the majority of children in both treatment arms of the SAM 
study were unable to provide faecal samples before the initiation of anti- 
biotic treatment due to the severity of their illness, a subset of nine children 
each provided one or two faecal samples (n = 12) before administration 
of parenteral ampicillin and gentamicin, and oral amoxicillin. Microbiota 
immaturity was manifest at this early time-point before antibiotics in 
these nine children (relative microbiota maturity: —5.15 + 0.9 months 
versus —0.03 + 0.1 for the 38 reference healthy controls; Mann-Whitney, 
P<0.0001). Sampling these nine children after treatment with parenteral 
and oral antibiotics but before initiation of RUTF or Khichuri-Halwa 
(6 + 3.6 days after hospital admission) showed that there was no signifi- 
cant effect on microbiota maturity (Wilcoxon matched-pairs rank test, 
P= 1). When pre-antibiotic faecal samples from these nine children were 
compared to samples collected at the end of all treatment interventions 
(dietary and antibiotic, 20 + 9 days after admission), no significant dif- 
ferences in relative microbiota maturity (Wilcoxon, P = 0.7), MAZ, 
bacterial diversity (or WHZ) were found (Extended Data Fig. 8a—d). 
This is not to say that these interventions were without effects on over- 
all community composition: opposing changes in the relative abund- 
ance of Streptococcaceae and Enterobacteriaceae were readily apparent 
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(Extended Data Fig. 8e, f; note that the Random Forests model classified 
both the microbiota of children with SAM sampled before and at the 
conclusion of all treatment interventions as immature, indicating lack 
of a generic immature state). Although these findings indicate that the 
relative microbiota immaturity associated with SAM was not solely at- 
tributable to the antibiotics used to treat these children, we could not, 
in cases where children were unable to provide pre-intervention faecal 
samples, measure the effects of other antibiotics, consumed singly or in 
various combinations during the acute infection control and nutritional 
rehabilitation phases, on their metrics of microbiota maturation (see Sup- 
plementary Notes and Supplementary Table 16 for further evidence 
indicating antibiotic use in the follow-up period did not correlate with 
the persistence of microbiota immaturity in children with SAM). 

SAM affects approximately 4% of children in developing countries. 
MAM is more prevalent, particularly in South Central Asia, where it affects 
approximately 19% (30 million children)’. Epidemiological studies indi- 
cate that periods of MAM are associated with progression to SAM, and 
with stunting which affects >40% of children under the age of five in 
Bangladesh’. Therefore, we extended our study to children from the 
singleton cohort at 18 months of age, when all had transitioned to solid 
foods (n = 10 children with WHZ lower than —2 s.d., the threshold for 
MAM; 23 children with healthy WHZ; Supplementary Table 17). The 
relationship between relative microbiota maturity, MAZ and WHZ was 
significant (Spearman’s Rho = 0.62 and 0.63, P < 0.001, respectively; Ex- 
tended Data Fig. 9a, b). Comparing children with MAM to those defined 
as healthy revealed significantly lower relative microbiota maturity, MAZ 
and differences in the relative abundances of age-discriminatory taxa 
in the malnourished group (Extended Data Figs 9d- and 10a, b). These 
results suggest that microbiota immaturity may be an additional patho- 
physiological component of moderately malnourished states. 

In conclusion, definition of microbiota maturity using bacterial taxo- 
nomic biomarkers that are highly discriminatory for age in healthy chil- 
dren has provided a way to characterize malnourished states, including 
whether responses to food interventions endure for prolonged periods of 
time beyond the immediate period of treatment. RUTF and Khichuri- 
Halwa produced improvements in microbiota maturity indices that were 
not sustained. Addressing the question of how to achieve durable res- 
ponses in children with varying degrees of malnutrition may involve 
extending the period of administration of existing or new types of food 
interventions’. One testable hypothesis is that a population’s microbiota 
conditioned for generations on a diet will respond more favourably to 
nutrient supplementation based on food groups represented in that diet. 
Next-generation probiotics using gut-derived taxa may also be required 
in addition to food-based interventions. The functional roles (niches) of 
the age-discriminatory taxa identified by our Random Forests model 
need to be clarified since they themselves may be therapeutic candidates 
and/or form the basis for low cost field-based diagnostic assessments. 

Systematic analyses of microbiota maturation in different healthy 
and malnourished populations living in different locales, representing 
different lifestyles and cultural traditions'*"*, may yield a taxonomy-based 
model that is generally applicable to many countries and types of diagnostic 
and therapeutic assessments. Alternatively, these analyses may dem- 
onstrate a need for geographic specificity when constructing such 
models (and diagnostic tests or therapeutic regimens). Two observa- 
tions are notable in this regard. First, expansion of our sparse model 
from 24 to 60 taxa yielded similar results regarding the effects of dia- 
rrhoea in healthy individuals, MAM and SAM (and its treatment with 
RUTF and Khichuri—Halwa) on microbiota maturity (see Supplementary 
Notes). Second, we applied the model that we used for Bangladeshi 
children to healthy children in another population at high risk for 
malnutrition. The results show that the model generalizes (r° = 0.6) 
to a cohort of 47 Malawian twins and triplets, aged 0.4-25.1 months, 
who were concordant for healthy status in a previous study'’ (WHZ, 
—0.23 + 0.97 (mean + s.d.); Supplementary Table 18). Age-discriminatory 
taxa identified in healthy Bangladeshi children show similar age-dependent 
changes in their representation in the microbiota of healthy Malawian 
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children, as assessed by the Spearman rank correlation metric (Extended 
Data Fig. 10c, d). 

The question of whether microbiota immaturity associated with SAM 
and MAM is maintained during and beyond childhood also under- 
scores the need to determine the physiologic, metabolic and immuno- 
logic consequences of this immaturity, and how they might contribute 
to the associated morbidities and sequelae of malnutrition, including 
increased risk for diarrhoeal disease, stunting, impaired vaccine res- 
ponses, and cognitive abnormalities’. Our study raises a testable hypo- 
thesis: namely, that assessments of microbiota maturation, including in 
the context of the maternal-infant dyad, will provide a more compre- 
hensive view of normal human development and of developmental dis- 
orders, and generate new directions for preventive medicine. Testing this 
hypothesis will require many additional clinical studies but answers may 
also arise from analyses of gut microbiota samples that have already been 
stored from previous studies. 


METHODS SUMMARY 


All subjects lived in Dhaka, Bangladesh (see Methods and Supplementary Notes 
for anthropologic assessment of Mirpur, an urban slum in Bangladesh, where most 
subjects resided). Informed consent was obtained and studies were conducted using 
protocols approved by the ICDDR,B, Washington University, and University of 
Virginia institutional review boards (IRBs). Linear mixed models were applied to 
test hypotheses in repeated measurements of relative abundance of 97%-identity 
OTUs and maturation metrics in time-series profiling of faecal microbiota”’. To 
account for similarity between observations from repeated sampling of the same 
individuals and families, we fit random intercepts for each subject in the case of 
adults and singletons, nested these intercepts within each family in the case of 
twins and triplets, and included age as a fixed-effect covariate, while testing the 
significance of associations between the microbiota and specified host and envir- 
onmental factors. Differences between microbiota maturation metrics in each 
treatment phase of SAM were compared to values at enrollment in each treatment 
group, and to healthy children within the same age range (excluding samples from 
children used to train the Random Forests model), using analysis of variance 
(ANOVA) of linear mixed models followed by Dunnett’s post-hoc comparisons. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Singleton birth cohort. Full details of the design of this now-complete birth 
cohort study have been described previously’. Faecal microbiota samples were 
profiled from 25 children who had consistently healthy anthropometric measures 
based on quarterly (every 3 months) measurements (Supplementary Table 1). The 
WHZ threshold used for ‘healthy’ (on average above —2 s.d.) was based on median 
weight and height measurements obtained from age- and gender-matched infants 
and children by the Multi-Centre Growth Reference study of the World Health 
Organization’. Clinical parameters, including diarrhoeal episodes and antibiotic con- 
sumption associated with each of their faecal samples are provided in Supplementary 
Table 2. 

A second group studied from this singleton cohort consisted of 33 children 

sampled cross-sectionally at 18 months, including those who were incorporated as 
healthy reference controls, and those with a WHZ < —2 who were classified with 
MAM (Supplementary Table 17). 
Twins and triplets birth cohort. Mothers with multiple pregnancy, identified by 
routine clinical and sonographic assessment at the Radda Maternal Child Health 
and Family Planning (MCH-FP) Clinic in Dhaka, were enrolled in a prospective 
longitudinal study (n = 11 mothers with twins, 1 mother with triplets). The zygosity 
of twin pairs and triplets was determined using plasma DNA and a panel of 96 
polymorphic single-nucleotide polymorphisms (SNPs) (Center for Inherited Disease 
Research, Johns Hopkins University). Four twin pairs were monozygotic, six were 
dizygotic, and the set of triplets consisted of a monozygotic pair plus one fraternal 
sibling (Supplementary Table 1; note that one of the 11 twin pairs could not be tested 
for zygosity because plasma samples were not available). Information about samples 
from healthy twins, triplets and their parents, including clinical parameters assoc- 
iated with each faecal sample, is provided in Supplementary Tables 2 and 3. 

The three healthy Bangladeshi groups used for model training and validation 
had the following WHZ scores: —0.32 + 1 (mean + s.d.; 12 singletons randomized 
to the training set), —0.44 + 0.8 (13 singletons randomized to one of the two valid- 
ation sets), and —0.46 + 0.7 (twins and triplets in the other validation set) (Supplemen- 
tary Table 4). The average number of diarrhoeal episodes in the singleton training 
set, the singleton validation set, and the twin and triplet validation set (4, 4.6 and 
1.7, respectively) was comparable to values reported in previous surveys of another 
cohort of 0-2-year-old Bangladeshi children (4.25 per child per year) ”*. 

There were no significant differences in the number of diarrheal episodes per 

year per child and the number of diarrhoeal days per year per child between the 
singleton training and validation sets (Student’s t-test, P = 0.5). Moreover, across 
all training and validation sets, neither of these diarrheal parameters correlated 
with mean age-adjusted Shannon diversity indices (Spearman’s Rho, —0.18 and 
—0.12, P= 0.22 and 0.4, respectively). The fraction of faecal samples collected 
from each child where oral antibiotics had been consumed within the prior 7 days 
was not significantly different between the training and two validation sets (one- 
way ANOVA, P = 0.14; see Supplementary Table 4). 
Severe acute malnutrition study. Sixty-four children in the Nutritional Rehabilitation 
Unit of ICDDR,B, Dhaka Hospital suffering from SAM (defined as having a WHZ 
less than —3 s.d. and/or bilateral pedal oedema) were enrolled in a randomized 
interventional trial to compare an imported peanut-based RUTF, Plumpy’Nut 
(Nutriset Plumpyfield, India) and locally produced Khichuri-Halwa (clinical trial 
NCT01331044). Initially, children were stabilized by rehydration and feeding ‘suji’, 
which contains whole bovine milk powder, rice powder, sugar and soybean oil 
(approximately 100 kcal kg~' body weight per day, including 1.5 g protein kg™! 
per day). Children were then randomized to the Khichuri-Halwa or RUTF groups. 
Khichuri consists of rice, lentils, green leafy vegetables and soybean oil; Halwa 
consists of wheat flour (atta), lentils, molasses and soybean oil. Children randomized 
to the Khichuri-Halwa treatment arm also received milk suji ‘100’ during their 
nutritional rehabilitation phase (a form of suji with a higher contribution of calories 
from milk powder compared to suji provided during the acute phase). RUTF is 
a ready-to-use paste that does not need to be mixed with water; it consists of pea- 
nut paste mixed with dried skimmed milk, vitamins and minerals (energy density, 
5.4kcal g'). Khichuri and Halwa are less energy-dense than RUTF (1.45 kcal g" ! 
and 2.4 kcal g” ', respectively, see Supplementary Table 13 for a list of ingredients 
for all foods used during nutritional rehabilitation). 

The primary outcome measurement, rate of weight gain (g kg’ per day), along 
with improvement in WHZ after nutritional rehabilitation are reported by child in 
Supplementary Table 10. Faecal samples were collected before randomization to 
the RUTF and Khichuri-Halwa treatment arms, every 3 days during nutritional 
rehabilitation and once a month during the follow-up period (information assoc- 
iated with each faecal sample is provided in Supplementary Table 11). 
Anthropologic study. To obtain additional information about household practices 
in the Mirpur slum of Dhaka, in-depth semi-structured interviews and observations 
were conducted over the course of 1 month in nine households (n = 30 individuals). 
This survey, approved by the Washington University and ICDDR,B IRBs, involved 


three ICDDR,B field research assistants, and three senior scientific staff in the 
ICDDR,B Centre for Nutrition and Food Security, plus two anthropologists 
affiliated with Washington University in St. Louis. Parameters that might affect 
interpretation of metagenomic analyses of gut microbial-community structure 
were noted, including information about daily food preparation, food storage, 
personal hygiene and childcare practices. 

Characterization of the bacterial component of the gut microbiota by V4-16S 
rRNA sequencing. Faecal samples were frozen at —20 °C within 30 min of their 
collection and subsequently stored at —80 °C before extraction of DNA. DNA was 
isolated by bead-beating in phenol and chloroform, purified further (QIAquick 
column), quantified (Qubit) and subjected to polymerase chain reaction (PCR) 
using primers directed at variable region 4 (V4) of bacterial 16S rRNA genes. 
Bacterial V4-16S rRNA data sets were generated by multiplex sequencing of ampli- 
cons prepared from 1,897 faecal DNA samples (26,580 + 26,312 (mean = s.d.) 
reads per sample, paired-end 162- or 250-nucleotide reads; Illumina MiSeq plat- 
form; Supplementary Table 5). Reads of 250 nucleotides in length were trimmed to 
162 nucleotides, then all reads were processed using previously described custom 
scripts, and overlapped to 253-nucleotide fragments spanning the entire V4 amplicon’». 
‘Mock’ communities, consisting of mixtures of DNAs isolated from 48 sequenced 
bacterial members of the human gut microbiota combined in one equivalent and 
two intentionally varied combinations, were included as internal controls in the 
Illumina MiSeq runs. Data from the mock communities were used for diversity and 
precision-sensitivity analyses employing methods described previously”’. 

Reads with = 97% nucleotide sequence identity (97%-identity) across all studies 
were binned into operational taxonomic units (OTUs) using QIIME (v 1.5.0), and 
matched to entries in the Greengenes reference database (version 4feb2011)**”. 
Reads that did not map to the Greengenes database were clustered de novo with 
UCLUST at 97%-identity and retained in further analysis. A total of 1,222 97%- 
identity OTUs were found to be present at or above a level of confident detection 
(0.1% relative abundance) in at least two faecal samples from all studies. Taxonomy 
was assigned based on the naive Bayesian RDP classifier version 2.4 using 0.8 as the 
minimum confidence threshold for assigning a level of taxonomic classification to 
each 97%-identity OTU. 

Definition of gut-microbiota maturation in healthy children using Random 
Forests. Random Forests regression was used to regress relative abundances of 
OTUs in the time-series profiling of the microbiota of healthy singletons against 
their chronologic age using default parameters of the R implementation of the 
algorithm (R package ‘randomForest’, ntree = 10,000, using default mtry of p/3 
where p is the number of input 97%-identity OTUs (features))’*. The Random 
Forests algorithm, due to its non-parametric assumptions, was applied and used to 
detect both linear and nonlinear relationships between OTUs and chronologic age, 
thereby identifying taxa that discriminate different periods of postnatal life in 
healthy children. A rarefied OTU table at 2,000 sequences per sample served as 
input data. Ranked lists of taxa in order of Random Forests reported ‘feature 
importance’ were determined over 100 iterations of the algorithm. To estimate 
the minimal number of top ranking age-discriminatory taxa required for predic- 
tion, the rfcv function implemented in the ‘randomForest’ package was applied 
over 100 iterations. A sparse model consisting of the top 24 taxa was then trained 
on the training set of 12 healthy singletons (272 faecal samples). Without any 
further parameter optimization, this model was validated in other healthy children 
(13 singletons, 25 twins and triplets) and then applied to samples from children 
with SAM and MAM. A smoothing spline function was fit between microbiota age 
and chronologic age of the host (at the time of faecal sample collection) for healthy 
children in the validation sets to which the sparse model was applied. 

Alpha diversity comparisons. Estimates of within-sample diversity were made at 
a rarefaction depth of 2,000 reads per sample. A linear regression was fit between 
the Shannon diversity index (SDI) and postnatal age in the 50 healthy children 
using a mixed model (see the additional details regarding statistical methods, below). 
An estimate of the coefficient for the slope of SDI with age and intercept was 
extracted, residuals of this regression were defined as a ASDI metric, and associa- 
tions of this metric with clinical parameters were tested in the cohort of healthy 
twins and triplets. To test for differences in SDI as a function of health status and 
chronologic age in malnourished children, we compared the distribution of age- 
adjusted ASDIs in children with SAM between treatment phases. 

Detection of associations of bacterial taxa with nutritional status and other 
parameters. Relative abundances of 97%-identity OTUs were used in linear mixed 
models as response variables to test for associations with clinical metadata as pre- 
dictors. For each comparison, we restricted our analysis to 97%-identity OTUs and 
bacterial families whose relative abundance values reached a level of confident 
detection (0.1%) in a minimum of 1% of samples in each comparison. Pseudocounts 
of 1 were added to 97%-identity OTUs to account for variable depth of sequencing 
between samples, and relative abundances were arcsin-square-root-transformed to 
approximate homoscedasticity when applying linear models. P values of associations 
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of factors with the relative abundance of bacterial taxa were computed using 
ANOVA type III (tests of fixed effects), subjected to Benjamini-Hochberg false 
discovery rate (FDR) correction. 

Enteropathogen testing. Clinical microscopy was performed for all faecal samples 
collected at monthly intervals from the singleton birth cohort and from healthy 
twins and triplets, and screened for Entamoeba histolytica, Entamoeba dispar, 
Escherichia coli, Blastocystis hominis, Trichomonas hominis, Blastocystis hominis, 
Coccidian-like bodies, Giardia lamblia, Ascaris lumbricoides, Trichuris Tricuria, 
Ancylostoma duodenale/Necator americanus, Hymenolepsis nana, Endolimax nana, 
Todamoeba butschlii and Chilomastix mesnili. The effects of enteropathogens, detected 
by microscopy on relative microbiota maturity, MAZ and SDI were included 
in our analysis of multiple environmental factors in Extended Data Fig. 2 and 
Supplementary Table 7. In cases in which children presented with SAM plus 
diarrhoea, faecal samples collected before nutritional rehabilitation were cultured 
for Vibrio cholerae, Shigella flexneri, Shigella boydi, Shigella sonnei, Salmonella enterica, 
Aeromonas hydrophila and Hafnia alvae. See Supplementary Tables 10 and 19 for 
results of enteropathogen testing. 

Additional details regarding statistical methods. Linear mixed models were 
applied to test for associations of microbiota metrics (relative microbiota maturity, 
MAZ and SDI) with genetic and environmental factors in twins and triplets. Log- 
likelihood ratio tests and F tests were used to perform backward elimination of non- 
significant random and fixed effects*’. Relative microbiota maturity, MAZ and SDI 
were defined at different phases of treatment and at defined periods of follow-up 
(<1 month, 1-2, 3-4 and >4 months after completion of the RUTF or Khichuri-Halwa 
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nutritional intervention) in children with SAM relative to healthy children. ‘Treat- 
ment phase’ was specified as a categorical multi-level factor in a univariate mixed 
model with random by-child intercepts. Dunnett’s post-hoc comparison procedure 
was performed to compare each treatment phase relative to healthy controls and 
relative to samples collected at enrollment in each food intervention group. 
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Extended Data Figure 1 | Illustration of the equations used to calculate computed in samples collected from children used to validate the Random- 
‘relative microbiota maturity and ‘microbiota-for-age Z-score’. a, b, The Forests-based sparse 24-taxon model and are shown ina, as a broken line of the 
procedure to calculate both microbiota maturation metrics are shown for a interpolated spline fit and in b, as median + s.d. values for each monthly 


single faecal sample from a focal child (pink circle) relative to microbiota age chronologic age bin from months 1 to 24. 
values calculated in healthy reference controls. These reference values are 
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a 
Sample prior to first 
diarrhoeal episode Diarrhoea <1 
(n=23) (n=36) (n=28) 


Relative microbiota maturity (mo) 


s 


-0.4 


-0.6 


Effect on age-adjusted alpha diversity 
(Shannon Diveristy Index, SDI) 


-0.8 


Extended Data Figure 2 | Transient microbiota immaturity and reduction 
in diversity associated with diarrhoea in healthy twins and triplets. a, The 
transient effect of diarrhoea in healthy children. Seventeen children from 10 
families with healthy twins or triplets had a total of 36 diarrhoeal illnesses where 
faecal samples were collected. Faecal samples collected in the months 
immediately before and following diarrhoea in these children were examined 
in an analysis that included multiple environmental factors in the ‘healthy 
twins and triplets’ birth cohort. Linear mixed models of these specified 
environmental factors indicated that ‘diarrhoea’, ‘month following diarrhoea’ 
and ‘presence of formula in diet’ have significant effects on relative microbiota 
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Months since last diarrhoeal episode 

(fecal samples collected during subsequent diarrhoea-free period) 
1-2 2-3 3-4 4-6 
(n=22) 


(n=18) (n=19) 


maturity, while accounting for random effects arising from within-family and 
within-child dependence in measurements of this maturity metric. The factors 
‘postnatal age’, ‘presence or absence of solid foods’, ‘exclusive breastfeeding’, 
‘enteropathogen detected by microscopy’, ‘antibiotics’ as well as ‘other 
periods relative to diarrhoea’ had no significant effect. The numbers of faecal 
samples () are shown in parenthesis. Mean values + s.e.m. are plotted. 

*P <0.05, ***P < 0.001. See Supplementary Table 7 for the effects of dietary 
and environmental covariates. b, Effect of diarrhoea and recovery on age- 
adjusted Shannon diversity index (SDI). Mean values of effect on SDI + s.e.m. 
are plotted. *P < 0.05, **P < 0.01. 
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Less 
similar 


More 
similar 


Mothers 


Twins & Triplets 


Relative abundance of a bacterial taxon 
Minimum A Maximum 


OTUID Taxonomic annotation 


348374 Bacteroides thetaiotaomicron 
158660 Bacteroides sp. 

72820 Bifidobacterium longum 
561483 Bifidobacterium sp. 
469873 Bifidobacterium sp. 
192132 Bilophila wadsworthia 
194648  Blautia sp M25 

113558 Enterobacteriaceae sp. 
44126 — Enterococcus sp. 
259130 Eubacterium limosum 
259422 Lactobacillus fermentum 
122816 Morganella morganii 
528842 Streptococcus sp. 
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Extended Data Figure 3 | Gut microbiota variation in families with twins 
and triplets during the first year of life. a, Maternal influence. Heatmap of the 
mean relative abundances of 13 bacterial taxa (97%-identity OTUs) found to be 
statistically significantly enriched in the first month post-partum in the faecal 
microbiota of mothers (see column labelled 1) compared to microbiota 
sampled between the second and twelfth months post-partum (FDR-corrected 
P<0.05; ANOVA of linear mixed-effects model with random by-mother 
intercepts). An analogous heatmap of the relative abundance of these 

taxa in their twin or triplet offspring is shown. Three of these 97%-identity 
OTUs are members of the top 24 age-discriminatory taxa (blue) and belong to 
the genus Bifidobacterium. b-e, comparisons of maternal, paternal and infant 
microbiota. Mean values + s.e.m. of Hellinger and unweighted UniFrac 
distances between the faecal microbiota of family members sampled over time 
were computed. Samples obtained at postnatal months 1, 4, 10 and 12 from 
twins and triplets, mothers and fathers were analysed (m = 12 fathers; 
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12 mothers; 25 children). b, Intrapersonal variation in the bacterial component 
of the maternal microbiota is greater between the first and fourth months after 
childbirth than variation in fathers. c, Distances between the faecal microbiota 
of spouses (each mother-father pair) compared to distances between all 
unrelated adults (male-female pairs). The microbial signature of co-habitation 
is only evident 10 months following childbirth. d, e, The degree of similarity 
between mother and infant during the first postpartum month is 
significantly greater than the similarity between microbiota of fathers and 
infants (d) while the faecal microbiota of co-twins are significantly more similar 
to one another than to age-matched unrelated children during the first 

year of life (e). For all distance analyses, Hellinger and unweighted 

UniFrac distance matrices were permuted 1,000 times between the groups 
tested. P values represent the fraction of times permuted differences 

between tested groups were greater than real differences between groups. 

*P <0.05, **P < 0.01, ***P < 0.001. 
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Extended Data Figure 4 | Anthropometric measures of nutritional status in _and referenced to national average anthropometric values for children surveyed 
children with SAM before, during and after both food interventions. between the ages of 6 and 24 months during the 2011 Bangladeshi 

a-c, Weight-for-height Z-scores (WHZ) (a) height-for-age Z-scores (HAZ) Demographic Health Survey (BDHS)**. 

(b) and weight-for-age Z-scores (WAZ) (c). Mean values + s.e.m. are plotted 
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Extended Data Figure 5 | Persistent reduction of diversity in the gut 
microbiota of children with SAM. Age-adjusted Shannon diversity index for 
faecal microbiota samples collected from healthy children (n = 50), and 

from children with SAM at various phases of the clinical trial (mean 

values + s.e.m. are plotted). The significance of differences between SDI at 


various stages of the clinical trial is indicated relative to healthy controls 
(above the bars) and versus the time of enrollment before treatment (below the 
bars). *P <0.05, **P< 0.01, ***P < 0.001 (post-hoc Dunnett’s multiple 
comparison procedure of linear mixed models). See Supplementary 

Table 14. 
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Healthy -_ Children with SAM 


children During During enaaaica 
= A ‘ rOnologic. je 
(1-24 mo) RUTF Follow-up Khichuri-Halwa Follow-up pie 9 
Enrollment Discharge (mo) Enrollment Discharge (mo) 
| | <3 >3 | | <3 >3 Mic Me12 
bt ff ff fH fF HEH fF 8 Ff ft FT Mi 12-18 18-24 
a_ Taxa enriched at enrollment in children with SAM relative to healthy OTU ID Taxonomic annotation 


142054 Enterobacteriaceae sp. 
210269 Enterobacteriaceae sp. 
9715 Enterobacteriaceae sp. 
563485 Enterobacteriaceae sp. 
436723 Enterobacteriaceae sp. 
512914 Enterobacteriaceae sp. 
310265 Enterobacteriaceae sp. 
307981 Enterobacteriaceae sp. 
307080 Escherichia sp. 

305760 Escherichia coli 
113558 Enterobacteriaceae sp. 
280706 Enterobacteriaceae sp. 
540230 Enterococcus faecalis 
15382 Streptococcus sp. 
249155.cO Leuconostoc sp. 
316587 Streptococcus gallolyticus 


b Taxa enriched during follow-up in children with SAM relative to healthy 


292424 Streptococcus sp. 

148099 Weissella cibaria 

628.d0 Streptococcus sp. 

239.d0 Lactobacillus sp. 

282068.cO Lactobacillus sp. 

528842 Streptococcus parasanguinis 
108747 Streptococcus thermophilus 
340.d0 Bifidobacterium sp. 

73.d0 Lactobacillaceae sp. 

294794 Streptococcus sp. 

233573 Erysipelotrichaceae sp. 


Cc Taxa depleted before but not after intervention in SAM relative to healthy 


Relative abundance of a bacterial taxon 


Minimum a Maximum 


24773 Bifidobacterium sp. 
131391 Bifidobacterium sp. 
470663 Lactobacillus ruminis 
186029 Collinsella aerofaciens 
89679.cO Collinsella aerofaciens 
469873 Bifidobacterium sp. 
139221 Coriobacteriaceae sp. 
15141 Lactobacillus mucosae 
198251 Ruminococcus gnavus 


292302 Lactobacillus sp. 
250395 Lactobacillus sp. 
326977 Bifidobacterium sp. 


370431 Actinomyces odontolyticus 
561483 Bifidobacterium sp. 

72820 Bifidobacterium longum 
142448 Lactobacillus sp. 

28727 Lactobacillus sp. 

182804 Coriobacteriaceae sp. 
470477 Granulicatella adiacens 


302844 Clostridium disporicum 
470369 Eubacterium biforme 
295024 Eubacterium biforme 


579564 Clostridium disporicum 
594084 Slackia isoflavoniconvertens 
178122 Ruminococcus obeum 
182202 Clostridium glycolicum 


100258 Eubacterium sp cL 10 1 3 
471180 Bifidobacterium sp. 


Extended Data Figure 6 | Heatmap of bacterial taxa significantly altered shown: those enriched before the food intervention (a); those enriched during 
during the acute phase of treatment and nutritional rehabilitation in the the follow-up phase compared to healthy controls (b); and those that are 
microbiota of children with SAM compared to similar-age healthy children. _ initially depleted but return to healthy levels (c). Members of the top 
Bacterial taxa (97%-identity OTUs) significantly altered (FDR-corrected 24 age-discriminatory taxa are highlighted in blue. Note that there were no 

P < 0.05) in children with SAM are shown (see Supplementary Table 15 for _ children represented in the Khichuri-Halwa arm under the age of 12 months 
P values and effect size for individual taxa). Three groups of bacterial taxa are during the ‘follow-up after 3 months’ period. 


©2014 Macmillan Publishers Limited. All rights reserved 


Healthy children hs _ Children with SAM 


(1-24 mo) During During 
RUTF Folfoncue Khichuri-Halwa Folfowaip 
Enrollment | Discharge (mo) Enrollment | Discharge (mo) 
2 <3 >3 


a ee See ee ee ee ee ee ee ee ee el 
Taxa depleted before and after intervention in SAM relative to healthy 


b Taxa depleted during follow-up in SAM relative to healthy 


Chronologic Age 


(mo) 

Bis Me-12 

i218 18-24 
OTUID Taxonomic annotation 
469852 Bifidobacterium bifidum 
533785 Bifidobacterium sp 
326792 Faecalibacterium prausnitzii 
301004 Olsenella sp. 
181834 Clostridium sp. 
261912 Dorea formicigenerans 
13823 Veillonella ratti 
188900 Faecalibacterium prausnitzii 
187010 Faecalibacterium prausnitzii 
576.d0 Coriobacteriaceae sp. 
162427 Megasphaera sp 
303304 Prevotella copri 
309068 Prevotella copri 
145149 Veillonella sp 
130663 Bacteroides fragilis 
194745 Ruminococcus sp 5 1 39BFAA 
212503 Clostridium sp 
184464 Prevotella copri 
274208 Megasphaera elsdenii 
364234 Ruminococcus sp 5 1 39BFAA 
259261 Megamonas sp. 
191687 Dorea longicatena 
189827 Ruminococcus sp 51 39BFAA 
36504 Ruminococcus sp 5 1 39BFAA 
258806.c0 Coriobacteriaceae sp. 
165261 Clostridium sp. 
2000 Bacteroides fragilis 
177351 Prevotelia sp. 
58262 Allisonella histaminiformans 
212619 Ruminococcaceae sp. 
48207 Dialister sp 
158660 Bacteroides sp 
195574 Prevotella sp. 
170124 Eubacterium desmolans 
11372 Eggerthella lenta 
365758 Prevotella sp. 
361809 Ruminococcus torques 
287510 Catenibacterium mitsuokai 
177005 Clostridiales sp. 
185951 Clostridiales sp. 
155355.c0  Faecalibacterium sp. 
325969 Clostridium sp SS2 7 
268604 Prevotella sp. 
71685, Ruminococcus torques 
181003 Ruminococcus sp 
266274 Clostridiales sp. 
198941 Eubacterium desmolans 
184037 Clostridium sp SS2 7 
325608 Clostridium bartlettii 
367433 Faecalibacterium prausnitzii 
517331 Prevotella sp. 
189396 Coprococcus comes 

916 Prevotella sp. 
369164 Clostridiales sp 
191306 Ruminococcus sp 5 1 39BFAA 

14 Haemophilus parainfluenzae 
185281 Clostridiales sp 
178146 Eubacterium hallii 
199293 Faecalibacterium prausnitzii 
177772 Eubacterium rectale 
174256 Eubacterium hallii 
182994 Eubacterium rectale 
179460 Clostridium sp 
212304 Clostridiales sp 
175682 Eubacterium rectale 
325738 Bacteroides galacturonicus 
168716 Clostridiales sp 
208931 Clostridiales sp. 
560141 Coriobacteriaceae sp 
212787 Clostridiales sp 
194648 Blautia sp M25 
417.40 Bifidobacterium sp 
196757 Bacteroides ovatus 
181330 Ruminococcus sp 5 1 39BFAA 
189862 Prevotella sp DJF B116 
192132 Bilophila wadsworthia 
331820 Bacteroides vulgatus 
41.40 Veillonellaceae sp. 
298533 Prevotella sp 
348374 Bacteroides thetaiotaomicron 
574.d0 Coriobacteriaceae sp. 
235476.cO Coriobacteriaceae sp. 
294710 Clostridium sp 
369502 Coprococcus catus 


Extended Data Figure 7 | Heatmap of bacterial taxa altered during long- 
term follow-up in the faecal microbiota of children with SAM compared to 
similar-age healthy children. a, b, Bacterial taxa (97%-identity OTUs) 
significantly altered (FDR-corrected P < 0.05) in children with SAM are shown 
(see Supplementary Table 15 for P values and effect sizes for individual taxa). 
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Relative abundance of a bacterial taxon 


Minimum 
Healthy children ——_—_—_________ Children with SAM_§ ——————_ 
(1-24 mo) During During 
RUTF Follow-up Khichuri-Halwa Follow-up 
Enrollment | Discharge — (mo) Enrollment | Discharge (mo) 
<3 >3 <3 33 


Maximum 


Chronologic Age 


(mo) 


Mic Mie-12 
M12-18 MM 18-24 


OTUID 


209122 
174902 
198161 
179287 
294196 
208539 
195493 
177495 
181170 
541301 
340615, 
204593 
193067 
172962 
363400 
171.40 

172274 
316732 
111135 
528303 
182087 
188.d1 

162623 
203590 


196225.cO 
172603 
187524 
195102 
343985 
207065 
189047 


43267 
312816.cO 


Taxonomic annotation 


Clostridiales sp. 
Faecalibacterium prausnitzii 
Bacteroidales sp. 
Faecalibacterium prausnitzii 
Prevotella copri 
Eubacterium sp. 

Roseburia intestinalis 
Subdoligranulum variabile 
Clostridiales sp 
Parabacteroides merdae 
Eubacterium hallii 
Eubacterium coprostanoligenes 
Eubacterium rectale 
Prevotella sp 

Clostridium clostridioforme 
Proteobacteria sp. 
Faecalibacterium prausnitzii 
Clostridiales sp. 

Sutterella wadsworthensis 
Prevotella sp 
Faecalibacterium prausnitzii 
Megamonas sp. 
Clostridiales sp. 

Clostridium sp. 
Subdoligranulum variabile 
Faecalibacterium prausnitzii 
Faecalibacterium prausnitzii 
Clostridiales sp 
Faecalibacterium prausnitzii 
Bacteroidetes sp. 

Roseburia sp 

Clostridiales sp. 

Bacteria sp. 

Clostridium lactatifermentans 
Faecalibacterium prausnitzii 
Clostridiales sp. 
Subdoligranulum variabile 
Eubacterium ramulus 
Phascolarctobacterium succinatutens 
Ruminococcus callidus 
Enterobacteriaceae sp 
Clostridiales sp. 
Eubacterium desmolans 
Clostridiales sp. 

Clostridium sp 
Faecalibacterium prausnitzii 
Eubacterium coprostanoligenes 
Prevotelia sp. 

Succinivibrio dextrinosolvens 
Roseburia intestinalis 
Clostridiales sp 
Eubacterium rectale 
Prevotella sp. 

Prevotella sp oral taxon 302 
Oscillibacter sp G2 
Clostridiales sp. 

Prevotella sp 

Clostridiales sp. 
Ruminococcus bromii 
Ruminococcus sp. 
Clostridiales sp. 
Eubacterium rectale 
Prevotella sp. 

Eubacterium hallii 
Clostridiales sp. 
Clostridiales sp 

Firmicutes sp 

Clostridiales sp. 
Ruminococcaceae sp 
Faecalibacterium prausnitzii 
Clostridiales sp. 

Firmicutes sp. 
Faecalibacterium prausnitzii 
Clostridiales sp 
Clostridiales sp. 
Clostridiales sp 
Faecalibacterium prausnitzii 
Bacteria sp. 

Clostridiales sp 
Clostridiales sp. 
Clostridiaceae sp. 


a, Taxa depleted across all phases of SAM relative to healthy. b, Those 
depleted during the follow-up phase. Members of the top 24 age-discriminatory 
taxa are highlighted in blue. Note that there were no children under the 

age of 12 months represented in the Khichuri-Halwa treatment arm during the 


‘follow-up after 3 months’ period. 
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the non-parametric Wilcoxon matched-pairs rank test, in which each child 
served as his or her own control. a-c, Microbiota parameters, plotted as mean 
values + s.e.m., include relative microbiota maturity, microbiota-for-age 
Z-score (MAZ), and SDI. WHZ scores are provided in d. e, f, The two 
predominant bacterial family-level taxa showing significant changes following 
antibiotic treatment. ns, not significant; **P < 0.01. 


Extended Data Figure 8 | Effects of antibiotics on the microbiota of children 
with SAM. Plots of microbiota and anthropometric parameters in nine 
children sampled before antibiotics (abx), after oral amoxicillin plus parenteral 
gentamicin and ampicillin, and at the end of the antibiotic and dietary 
interventions administered over the course of nutritional rehabilitation in the 
hospital. All comparisons were made relative to the pre-antibiotic sample using 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


a WHZ Score at 18 mo b W#Z Score at 18 mo c WHZ Score at 18 mo 
-4 -2 0 2 -4 -2 0 2 -4 2 0 2 
1 2 
: 6 C) g > 
3 Oe Boa DOr ee Qereeneens on «289 
F E 6 ° £8 
xo] oo -4 ® ° 29 
Nomey = 7 @ fo) Ce 
oe % @ = 32 
ee © e: @ 22 
—E2 g -2 @ fe) ° ge 
o5 38 @ oes 
25 eo 
BE q-3 Ss 
ca = : oe 
4 bd : @ of 
22 
Spearman rho = 0.62; p=0.0001 5 Spearman rho = 0.63; p<0.0001 & 2 not significant; p=0.16 
d Faecalibacterium prauznitzii 326792 e Dorea longicatena 191687 f Lactobacillus mucosae 15141 
20 4 
1 
ae 8 8 3 ° 
w ivy wo 
Be To me} 
= 3 = 
— — = 
3 10 7 8 & 2 
oO oO oO 
2 2 2 
wo . 
o 85 3 3 1 
o a ina 
a 
0 0 
g Catenibacterium mitsuokai 287510 h Dorea formicigenerans 261912 I Clostridium sp. 181834 
40 
a 
o oO o 
2 30 2 2 
wo iv} wo 
Bo To ne} 
1 = = 
=] = = 
3 20 8 8 
oO oO oO 
2 2 2 
> 10 . 3 rf 
o a ina 
0 
j Bifidobacterium sp. 469873 k Clostridiales sp. 185951 | Ruminococcaceae sp. 212619 
75 2.5 15 ‘ 
g : g 2 : 8 
3 § 1. § 
o 50 52] 6 10 
< €1.5 = . 
=| = = 
a Q Q /. 
oO oO oO 
g - 4 g 
a ° i = § 
© 205 2 
0 0 0 
MAM Similarly Aged MAM Similarly Aged MAM Similarly Aged 
(n = 10) Healthy Controls (n= 10) Healthy Controls (n = 10) Healthy Controls 
(n = 23) (n = 23) (n = 23) 


Extended Data Figure 9 | Relative microbiota maturity and MAZ correlate 
with WHZ in children with MAM. a-c, WHZ are significantly inversely 
correlated with relative microbiota maturity (a) and MAZ (b) in a cross- 
sectional analysis of 33 children at 18 months of age who were above and below 
the anthropometric threshold for MAM (Spearman’s Rho = 0.62 and 0.63, 
respectively; ***P < 0.001). In contrast, there is no significant correlation 
between WHZ and microbiota diversity (c). d-I, Relative abundances of age- 
discriminatory 97%-identity OTUs that are inputs to the Random Forests 


model that are significantly different in the faecal microbiota of children with 
MAM compared to age-matched 18-month-old healthy controls (Mann- 
Whitney U-test, P< 0.05). Box plots represent the upper and lower quartiles 
(boxes), the median (middle horizontal line), and measurements that are 
beyond 1.5 times the interquartile range (whiskers) and above or below the 75th 
and 25th percentiles, respectively (points) (Tukey’s method, PRISM software 
v6.0d). Taxa are presented in descending order of their importance to the 
Random Forests model. See Extended Data Fig. 10a, b. 
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Extended Data Figure 10 | Cross-sectional assessment of microbiota 
maturity at 18 months of age in Bangladeshi children with and without 
MAM, plus extension of the Bangladeshi-based model of microbiota 
maturity to Malawi. a, b, Children with MAM (WHZ lower than —2 s.d.; 
grey) have significantly lower relative microbiota maturity (a) and MAZ 

(b) compared to healthy individuals (blue). Mean values + s.e.m. are plotted 
**P << 0.01 (Mann-Whitney U-test). See Extended Data Fig. 9 for correlations 
of metrics of microbiota maturation with WHZ and box-plots of age- 
discriminatory taxa whose relative abundances are significantly different in 
children with MAM relative to healthy reference controls. c, Microbiota age 
predictions resulting from application of the Bangladeshi 24-taxon model to 47 


faecal samples (brown circles) obtained from concordant healthy Malawian 
twins and triplets are plotted versus the chronologic age of the Malawian donor 
(collection occurred in individuals ranging from 0.4 to 25.1 months old). The 
results show the Bangladeshi model generalizes to this population, which is also 
at high risk for malnutrition (each circle represents an individual faecal sample 
collected during the course of a previous study''). d, Spearman rho and 
significance of rank order correlations between the relative abundances of 
age-discriminatory taxa, and the chronologic age of all healthy Bangladeshi 
children described in the present study as well as concordant healthy Malawian 
twins and triplets. *P < 0.05. 
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Ribosomal oxygenases are structurally conserved 
from prokaryotes to humans 


Rasheduzzaman Chowdhury’, Rok Sekirnik'*, Nigel C. Brissett?*, Tobias Krojer*, Chia-hua Hol, Stanley S. Ng’, Ian J. Clifton’, 
Wei Ge!, Nadia J. Kershaw!, Gavin C. Fox‘, Joao R. C. Muniz’, Melanie Vollmar’, Claire Phillips®, Ewa S. Pilka®, 
Kathryn L. Kavanagh’, Frank von Delft?, Udo Oppermann**, Michael A. McDonough’, Aidan J. Doherty? 


& Christopher J. Schofield! 


2-Oxoglutarate (20G)-dependent oxygenases have important roles 
in the regulation of gene expression via demethylation of N-methylated 
chromatin components’ and in the hydroxylation of transcription 
factors’ and splicing factor proteins*. Recently, 20G-dependent oxy- 
genases that catalyse hydroxylation of transfer RNA®*’ and ribosomal 
proteins® have been shown to be important in translation relating to 
cellular growth, T};17-cell differentiation and translational accuracy”. 
The finding that ribosomal oxygenases (ROXs) occur in organisms 
ranging from prokaryotes to humans* raises questions as to their 
structural and evolutionary relationships. In Escherichia coli, YcfD 
catalyses arginine hydroxylation in the ribosomal protein L16; in 
humans, MYC-induced nuclear antigen (MINA53; also known as 
MINA) and nucleolar protein 66 (NO66) catalyse histidine hydro- 
xylation in the ribosomal proteins RPL27A and RPL8, respectively. 
The functional assignments of ROXs open therapeutic possibilities 
via either ROX inhibition or targeting of differentially modified ribo- 
somes. Despite differences in the residue and protein selectivities of 
prokaryotic and eukaryotic ROXs, comparison of the crystal struc- 
tures of E. coli YcfD and Rhodothermus marinus YcfD with those of 
human MINA53 and NO66 reveals highly conserved folds and novel 
dimerization modes defining a new structural subfamily of 20G- 
dependent oxygenases. ROX structures with and without their sub- 
strates support their functional assignments as hydroxylases but not 
demethylases, and reveal how the subfamily has evolved to catalyse 
the hydroxylation of different residue side chains of ribosomal pro- 
teins. Comparison of ROX crystal structures with those of other JmjC- 
domain-containing hydroxylases, including the hypoxia-inducible 
factor asparaginyl hydroxylase FIH and histone N*-methyl lysine 
demethylases, identifies branch points in 20G-dependent oxyge- 
nase evolution and distinguishes between JmjC-containing hydro- 
xylases and demethylases catalysing modifications of translational 
and transcriptional machinery. The structures reveal that new pro- 
tein hydroxylation activities can evolve by changing the coordina- 
tion position from which the iron-bound substrate-oxidizing species 
reacts. This coordination flexibility has probably contributed to the 
evolution of the wide range of reactions catalysed by oxygenases. 
To investigate the structural basis of catalytic differences within the 
ROX subfamily of JmjC-domain-containing hydroxylases and their rela- 
tionship with the JmjC-containing histone N*-methy1 lysine demethylases 
(KDMs, Fig. 1a), we conducted structural analyses on both prokaryotic 
(initially YcfD from E. coli (EcYcfD) and subsequently that from the 
thermophile R. marinus (RmYcfD)) and human ROXs (MINA5326_465 
and NO66 33-641). We used RmYcfD to obtain a YcfD substrate struc- 
ture. All four ROXs show marked similarities in their folds: the JmjC 
domain is followed by helical dimerization and carboxy-terminal ‘winged 
helix’ (WH) domains** (Fig. 1b). The ROX JmjC domains consist of 


11-12 B-strands, 8 of which (I-VIII) form a double-stranded f-helix 
(DSBH), which is stereotypical of 20G-dependent oxygenases'*”* (Fig. 1c 
and Extended Data Fig. 1). 

The dimerization domains have a two-fold symmetry and comprise 
a bundle of three «-helices (Extended Data Fig. 2); the dimers are sta- 
bilized by electrostatic and hydrogen bonding as well as hydrophobic 
interactions. Consistent with a catalytic role for this domain, dimeri- 
zation blocking substitutions, EcYcfD(I211R) and MINA53(R313E) 
decrease activity. Hydrogen-bonding and electrostatic interactions are 
substantially more important in RmYcfD dimerization than for the other 
ROXs, consistent with the increased occurrence of electrostatic interac- 
tions in thermophiles’’. The ROX C-terminal domains, which are required 
for activity (Extended Data Fig. 3), are reminiscent of WH domains 
involved in protein-protein and protein—nucleic-acid interactions’; how- 
ever, their overall negative charge suggests that they may not directly bind 
nucleic acids. In contrast to ROXs, other JmjC-containing hydroxylases— 
FIH”, tRNA yW-synthesizing protein 5 (TYW5)°, JmjC-domain-containing 
protein 4 (JMJD4)°, JMJD5 (ref. 18) and JMJD6 (ref. 4)—and KDMs, 
do not contain a WH domain (Fig. 2). The combined structures led to 
the proposal that the ROX fold evolved into those of JmjC-containing 
hydroxylases and KDMs partly via loss of the WH domain, which enabled 
the C-terminal helical bundle to take on other roles as in KDMs or the 
dimerization mode as observed in FIH”. 

ROX structures were determined in complex with Mn(II) and 20G 
or N-oxalylglycine (NOG), replacing Fe(II) and 20G. As for most 20G- 
dependent oxygenases, the metal is octahedrally coordinated by a 2-His- 
1-carboxylate triad from DSBH II and BVII'*"* (Fig. 3); two coordination 
sites are occupied by the 20G/NOG oxalyl group, leaving one for H,O/O2 
binding (Fig. 4 and Extended Data Fig. 4). With the YcfDs, the NOG C5 
carboxylate is positioned to salt bridge with EcYcfD Arg 140 or RmYcfD 
Arg 148 on DSBH BIV (Extended Data Fig. 4). This arrangement is 
notable because with other 20G-dependent oxygenases in which the 20G 
C5 carboxylate interacts with an Arg residue, it is located on BVIII'*”. 
Inhuman ROXs, the 20G C5-carboxylate-interacting residue is a lysine 
(MINAS3 Lys 194, NO66 Lys 355) from BIV, as in most JmjC-containing 
hydroxylases and KDMs. These observations suggest that eukaryotic 
JmjC-containing hydroxylases and KDMs evolved from prokaryotic ROXs. 

Initial attempts to obtain substrate complexes by co-crystallization/ 
soaking crystals were unsuccessful. We therefore pursued alternatives, 
one involving using a thermostable YcfD homologue, which we con- 
sidered may have a relatively low substrate dissociation constant (Ka), 
enabling complex crystallization. RmYcfD (which has 31% identity with 
EcYcfD) catalyses L16 fragment (20-residue peptide, amino acids Lys 72- 
Glu 91) Arg 82 hydroxylation with an approximately sevenfold lower 
Michaelis constant (K,,,) than EcYcfD (268 1M and 1.9 mM, respectively). 
A RmYcfD-L1672_9; structure, obtained by co-crystallization, was solved 
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Figure 1 | The overall folds of the ribosomal oxygenases. a, Reactions 
catalysed by ROX and related oxygenases. ARD, ankyrin repeat domain; CAD, 
C-terminal transactivation domain of HIF-«. b, Ribbon representations of 
EcYcfD, RmYcfD, MINA53 and NO66 homodimers. The monomers contain a 
JmjC domain with the DSBH core present in all 20G-dependent oxygenases 
(blue) followed by dimerization (yellow) and C-terminal WH domains (red). 
Domain architecture and a schematic representation of the DSBH core 
B-strands (BI-VIII) that form major (blue, BI, VIII, [II and VI) and minor 
sheets (grey, BII, VIL IV and V) is shown boxed. The insert between BIV and 
BV (purple) is involved in substrate binding. The three Fe-coordinating 
residues are on the BII and BVII strands (black circles). 20G is in green sticks; 
the 20G C5-carboxylate-binding residue, Arg (YcfDs) or Lys (human ROXs) 
from BIV is a black circle. 


by molecular replacement using the apo EcYcfD structure (Protein Data 
Bank (PDB) accession 4CCL). The overall EcYcfD and RmYcfD structures 
are similar (Cx root mean squared deviation (r.m.s.d.) 1.58 A); L16 
residues Lys 77-Lys 85 are visible in the electron density map (Fig. 3c). 

For the human ROXs, we used electrospray ionization—mass spectrom- 
etry guided disulphide crosslinking’’”° to obtain substrate complexes 
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(Extended Data Fig. 5). Structures were obtained for wild-type NO66- 
RPL8(G220C) (complex 1), NO66(L299C/C300S)-RPL8(G220C) (com- 
plex 2) and NO66(S373C)-—RPL8(G214C) (complex 3) pairs. Electron 
density corresponding to RPL8 residues 215-223 (complex 1),213-223 
(complex 2) and 212-223 (complex 3) was observed at the active site 
(Fig. 3b and Extended Data Fig. 5). The RPL8 residues (215-219)— 
including the hydroxylated His 216—adopt near identical conforma- 
tions (Co rm.s.d., 0.29-0.36 A), implying that all three structures rep- 
resent catalytically functional complexes (Extended Data Fig. 5). In the 
light of the NO66-RPLS structures, we identified a MINA53 residue 
(Tyr 209) suitable for crosslinking: MINA53(Y209C) crystallized in com- 
plex with RPL27A(G37C) with electron density observed for RPL27A 
residues 36-44 (Fig. 3a). Further validation of the functional relevance 
of the crosslinked structures comes from comparisons with the wild- 
type RmYcfD-L16 structure and kinetic studies demonstrating activ- 
ities with most variants (Extended Data Fig. 6). 

MINAS3 and NO66 bind their RPL27A and RPLS8 substrates in a con- 
served manner (Co r.m.s.d., RPL27A38_43, RPL8>5_220, 0.8 A). Com- 
parison of human ROX and RmYcfD complexes reveals similarities in 
substrate binding, particularly for the hydroxylated residue and for sub- 
strate residues to the amino-terminal side of the hydroxylated residue. 
In all ROX complexes, substrates bind with the same N/C direction- 
ality, as observed for FIH” and for one KDM—plant homeodomain finger 
8 (PHF8)”' (and probably other KDM2/7 subfamily members)—but dif- 
fering from that for most KDMs (KDM4A”, KDM6B” and KDM6A”) 
(Fig. 2). The substrates bind in shallow channels on the ROX surfaces 
and form multiple interactions/hydrogen bonds with residues from 
DSBH BL, Bll and BVIIL and the extended BIV-BV loop. Although the 
N-terminal regions of RPL27A (amino acids 36-39), RPL8 (213-216) 
and L16 (78-81) bind similarly, the C-terminal regions of RPL27A (40- 
44) and RPL8 (217-223) form more extensive interactions with human 
ROXs than does L16 (83-85) with RmYcfD (Fig. 3). Notably, both RPL27A 
and RPL8 substrates make hydrophobic contacts with the WH domains 
in MINAS53 and NO66 (Extended Data Fig. 3). In addition, MINA53 
forms a catalytically important salt-bridge interaction between RPL27A 
Arg 42 and MINAS3 Asp 333 located on the o-helix connecting the 
dimerization and WH domains (Extended Data Figs 6 and 7). 

The general binding mode of the hydroxylated residues is conserved 
between prokaryotic and human ROXs, that is, they bind in deep pockets 
and the positions of the hydroxylated $-methylenes nearly superimpose 
(Fig. 4). There are, however, clear differences in the way human ROXs 
and RmYcfD bind their target residue side chains (Fig. 3). With human 
ROXs, the binding of RPL27A His 39/RPL8 His 216 involves a series of 
hydrogen bonds to backbone amides or side chains of human ROX res- 
idues: MINA53 Gln 136/NO66 Arg 297; MINAS53 Asn 165/NO66 Asn 326; 
MINAS3 Tyr 167/NO66 Tyr 328; and MINA53 Ser 257/NO66 Ser 421 
(Fig. 3a, b). With RmYcfD, the Arg 82 ‘slots’ into a hydrophobic cleft 
defined by RmYcfD Tyr 137 and Met 120 side chains and hydrogen bonds 
to RmYcfD Asp 118 and Ser 208 (Fig. 3c). Mutagenesis studies on ROXs 
support the observed binding modes of the substrate residues (Extended 
Data Figs 6 and 8). 

There are conflicting reports as to the catalytic activities of some 
JmjC-containing hydroxylases, including NO66, which has been clas- 
sified as both a hydroxylase* and a KDM”’. Comparison of ROXs with 
KDMs and FIH (Figs 2 and 4a) identifies distinctive structural features 
characteristic of JmjC-containing hydroxylases and KDMs, in addition 
to the roles of the WH domains. This is important because it supports 
the assignment of hydroxylase (but not demethylase) activities for ROXs 
and other human JmjC-containing hydroxylases; for example, FIH’’ and 
JMJD6 (ref. 4). In our assays with isolated MINA53 and NO66 we have 
consistently not observed enzyme-catalysed demethylation under con- 
ditions in which JmjC-containing KDMs are active®. Although we cannot 
rule out the possibility that some of the JmjC-containing hydroxylases 
may have KDM activities under different conditions or in cells, the mul- 
tiple structures reported here suggest that for this to occur, substantial 
active-site rearrangements would be required on substrate binding. In 
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Dimerization Figure 2 | Comparison of the substrate 
Dimerization 


structures for ROXs and JmjC-containing 
pimenzeuan } : enzymes. a-f, Ribbon representations of ROX 
and related 20G-dependent oxygenase-substrate 
complexes. a, MINA53-Mn-20G-RPL27A (39-59) 
(P2,2,2), 2.05 A). b, NO66-Mn-NOG-RPL8 (295-224) 
(C2, 2.35 A). Cc, RmYcfD-Mn-NOG-L16,(72-91) 
(P2,22), 3.0 A). d, FIH~Fe-NOG-HIF-1 0786-826) 
(PDB accession 1H2K). e, PHF8-Fe-NOG- 
H3K4me3K9me2,2_25) (PDB accession 3KV4). 
PHD, plant homeodomain. f, KDM4A-Ni-NOG- 
H3K9me2(7_;4) (PDB accession 20X0). For 
comparison, the DSBH core of each structure is ina 

MINASS-APL27A NS similar orientation. Note the directionality of 
Dimerization substrate binding in the JmjC domains. The 
active-site metals (Fe/surrogate) are colour-coded 
spheres. Analyses of the structures reveal that the 
ROX overall folds (a—c), oligomerization states 
and active-site architectures are evolutionarily 
conserved. 


FIH-HIF-10, CAD PHF8-H3K4me3kK9me2 KDM4A-H3K9me2 


Figure 3 | Features of ROX-substrate binding. 
a-c, Ribbon representations of MINA53 (a), 
NO66 (b) and RmYcfD (c) monomers showing 
difference electron density (F, — F, omit map) for 
substrates contoured to 3o (right panels). Left 
panels depict active-site surface representations, 
showing key hydrogen bonds and polar 
interactions (dotted lines) with substrates. a, With 
MINAS3, the RPL27A His 39 imidazole nitrogens 
form hydrogen bonds with Tyr 167/Ser257 
(Nouis 39- OH tyr 167 2-9 As Népis 39-OYser 257 3-1 A). 
b, In NO66, RPL8 His 216 is similarly bound in a 
deep pocket; the RPL8 His 216 imidazole nitrogens 
form hydrogen bonds with Tyr 328/Ser 421 

(Nopis 216- OH tyr 328 3.2 A; NEpis 216-OYser 421 

2.7 A) and hydrophobic interactions with Ile 244 
that project its pro-S hydrogen towards the metal 
(metal-B-CH,, 4.4 A). a, b, Although MINA53 

(a) uses four primary amides—Asn 101, Gln 136, 
Gln 139 and Asn 165—to interact with RPL27A 
backbone amides, NO66 (b) uses two arginines 
(272, 297) to hydrogen bond with the RPL8 

Asn 215 side chain and RPL8 His 216 backbone. 
c, In the RmYcfD-L16 complex, the L16 Arg 82 
binds in a pocket defined by the Tyr 137/Met 120 
side chains, which form m-cation and hydrophobic 
interactions with the L16 Arg 82 side chain. The 
Arg 82 guanidino group makes electrostatic 
interactions with the RmYcfD Asp 118 carboxylate 
(O-NH, 2.8-3.1 A) and hydrogen bonds to 
RmYcfD Ser 208 (NEqrgs2-OHser 208 3-5 As 
NMargs2—COser208 3-2 A). Although MINAS53 

Tyr 167 and NO66 Tyr 328 are not positionally 
related to RmYcfD Tyr 137, the role of the serine 
(MINA53 Ser 257, NO66 Ser 421, RmYcfD Ser 208, 
BVIID) in binding the hydroxylated His/Arg is 
conserved in ROXs. Substitutions of these residues 
cause marked loss of activity (see Extended Data 
Fig. 6). 


424 | NATURE | VOL 510 | 19 JUNE 2014 
©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


a BIV-BV insert 
Bl BI Bill BIV BV BVI BVII BVIII 
RmYcfD 116: DDIMjjsYAPE-GGTVGAMIB--NyDVgLVOaWe- i — (24) -DAEWIMEP JP RIPHlYGVAL---EDCMTFSIGF: 210-GiD> <a> 
EcYcfD 109:pougSrBWP-cocvce +-OYDVig TOGT@- oe — IT IDEEMEP(D Tp41iPGFPEIEGYAL—--ENAMNYSVGE: 202 -l> <> 
NO66 323: AGSN\WYETPPNSQGFAP }--DIEAIVLOLEE- —VLOTVIREP(ED Lip ar ]RGF IBIOAECQDG-VHSLHLTLST: 422 GD ——> 
MINA53 —- 162: vesN\yY LTPAGSQGLP Pigyp--DVEVig I LOLE@- = es -VHEFMIKP(ED LIME |ZRGT 1 BIOADTPAGLAHSTHVTIST: 258 Ql 
FIH 183: TSNLIELIGME-GNVTPA —-EQONIFAQTKE- sé Te —GYETV\WGP| TI3MY HIESLLNGGITEITVNFWY: 297 GD 

PHF8 @—231: VOKYCLMSVR-DSYTD F GGTSVY HVLKE- a — (31) -CYKCS|YKOGOTMaTIATGWIBAVLTP---VDCLAFGGNB: 334 
KDM4A Q@—172:nTP Yi§yYFGMW-KTSFAI DMDLYS INYLHF@E ee aa —FDKVTOQBAQEFWI TEP YGYBAGFNH---GFNCAESTNE: 291 —@—@—a 

===> 8 -Sheet >) Dimerization domain Sa WH @ PHD Gm Tudor OQ JmjN 


NO66 OG press 


NO66 20G 
MINAS53 H405 
H240 


H340 


Figure 4 | Proposed sequence of evolution of active-site chemistry of ROXs 
and related JmjC-containing 20G-dependent oxygenases. a, b, The figure 
compares views from active sites of representative JmjC-containing enzymes 
and suggests how the ROX fold evolved into JmjC-containing hydroxylases and 
KDMs. Structurally informed cross-genomic bioinformatic analyses imply 
that ROXs are the earliest identified JmjC-containing 20G-dependent 
oxygenases”’; YcfD and NO66 both exist in prokaryotes but only NO66 is 
identified in eukaryotes. Coupled to the analyses of the active sites, these 
analyses imply that NO66 and its close relatives are the precursors of MINA53 
and other JmjC-containing hydroxyalses and KDMs. a, Top, structure-based 
alignment of ROX, FIH, PHF8 and KDM4A with the DSBH core, labelled 
BI-VIIL, the iron-coordinating and the 20G C5-carboxylate-binding residues 
are indicated in red and green. Bottom, analyses of active sites suggest 
conservation of metal/2OG binding in ROX, FIH and KDMs. Note the 20G 


ROXs and KDM4A-H3K9me2 (PDB accession 20X0)”* complex 
structures, the different substrates bind with ‘opposite’ N-to-C direc- 
tionalities with respect to the catalytic machinery. The histone K9me,, 
side chain is positioned similarly to the ROX-hydroxylated residue side 
chains; however, because KDM-catalysed hydroxylations occur at N*- 
methyl lysine-residue termini, their target residues do not penetrate as 
far into the enzyme active site (Fig. 4a). The ROXs also lack two flexible 
loops linking «4-BI (amino acids 164-175) and «9-110 (amino acids 
302-317) in KDM4A, which are conserved in KDMs?!*4 and which 
form important interactions with the Kme, side chain, illustrating how 
the ‘core’ ROX fold has been modified by evolution to accommodate 
the Kme,, side chain. 

Like ROXs, FIH catalyses B-hydroxylation of an Asn residue in its 
HIF-« transcription factor substrate’” and of other residues, including 
histidines in ankyrins**. Superimposition of human ROX and FIH- 
substrate structures is interesting from catalytic and evolutionary per- 
spectives. Although both FIH and human ROXs catalyse histidine 3S 
hydroxylation, the positions of their substrate imidazoles is markedly 
different (Fig. 4 and Extended Data Fig. 9). The positioning of hydrox- 
ylated methylenes relative to the metal differs substantially: in the overlaid 
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C5-carboxylate-binding residue (usually from BIV in JmjC-containing 
enzymes) changes from an Arg (in YcfDs) to a Lys (in human ROXs and 
JmjC-containing hydroxyalses/KDMs) (Extended Data Fig. 4). b, Overlays 

of the NO66/RmYcfD, NO66/MINA53, NO66/FIH and NO66/KDM4A 
active-site views. The hydroxylated B-methylenes nearly superimpose in ROXs, 
such that the oxidized C-H bonds (red arrows, 3-pro-R in L16 Arg 82 and 
3-pro-S in RPL27A His 39 and RPL8 His 216) project towards the metal. The 
spatial relationship of the hydroxylated C3/Ne-methyl carbon with respect 

to the metal (and associated reactive oxidizing species) is conserved in ROXs 
and the demethylases, for example, KDM4A, but not in FIH. Note the different 
hydroxylation positions, but the similar orientation of the CAD Asn 803 
(bound to FIH) (hydroxylated) and RPL8 Asn 215 (bound to NO66) 

(not hydroxylated). 


structures, the angle between the metal and the CB atoms of the RPL27A 
His 39/RPL8 His 216 (human ROX substrate) and HIF-1« Asn 803 (FIH 
substrate) is ~50° (Fig. 4 and Extended Data Fig. 9), demonstrating 
that the reactive oxidizing intermediates (Fe(IV)=O)'”’ react from 
different coordination positions in different oxygenases. Studies with 
20G-dependent halogenases have led to the proposal that iron-bound 
reactive intermediates abstract a hydrogen from the substrate and deliver 
a halogen or hydroxyl from different coordination positions to form 
products”. In contrast, our work implies flexibility in the coordination 
positions with respect to the hydrogen abstracted in different JmjC- 
containing hydroxylases from which the ferryl-oxo reacts. Together with 
other structural considerations, this observation has consequences for 
the evolution of the JmjC-containing oxygenases. 

RPL8 (NO66 substrate) has an Asn at the — 1 position relative to the 
hydroxylated His 216 (YcfD/MINAS53 substrates have hydrophobic res- 
idues at the analogous positions). The RPL8 Asn 215 methylene is only 
slightly (0.5 A) further from the metal than that of RPL8 His 216, revealing 
the extreme sensitivity of oxygenase catalysis to geometric positioning. 
There is a notable correlation in the binding of RPL8 Asn 215 and HIF- 
1a Asn 803 to NO66 and FIH, respectively, even though one residue is 
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hydroxylated and one is not; the primary amides of both RPL8 Asn 215 
and HIF- 1 Asn 803 hydrogen bond with primary amides, that is, NO66 
Asn 376 and FIH Gln 239. Collectively these observations reveal that 
20G-dependent oxygenases can evolve new activities not only by ‘directly’ 
altering the nature of enzyme-substrate interactions (including by alter- 
ing the directionality of substrate binding), but also by changing the 
coordination position from which the ferryl intermediate reacts. 

The combined structures reveal that the observed modes of ROX 
hydroxylations have probably evolved into those of other JmjC-containing 
hydroxylases and the KDMs, both by altering the coordination position 
from which the ferryl-oxo reacts and by engineering the depth of sub- 
strate penetration. Structurally informed phylogenetic analyses (Extended 
Data Fig. 10), coupled to the observation that NO66 is more widely dis- 
tributed than FIH and MINAS3, reveal that prokaryotic YcfDs evolved 
into NO66, which is a branch point leading to the eukaryotic JmjC- 
containing hydroxylases and demethylases. 20G-dependent oxygenases 
are among the most catalytically flexible of all enzyme families. Recent 
work has revealed that FIH manifests remarkable catalytic promiscuity, 
including the ability to oxidize Asn and His residues”. Our structural 
studies reveal that ROXs react with substrates through a different but 
evolutionarily related binding mode to FIH. The catalytic capabilities 
of 20G-dependent oxygenases for protein oxidations thus probably 
extend beyond those presently identified. 


METHODS SUMMARY 


Recombinant human MINA53 and NO66 and bacterial EcYcfD and RmYcfD were 
produced in E. coli and purified by metal affinity/cation exchange and size-exclusion 
chromatography. Assays comprised incubation with Fe(II), 20G and substrate fol- 
lowed by mass spectrometry and/or 2OG turnover assays. Crystals were grown by 
vapour diffusion (Supplementary Table 1) and cryo-cooled in liquid nitrogen. Data 
were collected on Swiss Light Source X10SA, European Synchrotron Radiation 
Facility BM16 and Diamond Light Source MX beamlines. MINAS53 and EcYcfD 
structures were solved by single-wavelength anomalous diffraction or by single iso- 
morphous replacement with anomalous scattering using SeMet derivatives. The 
NO66 structure was solved by molecular replacement (MR) using the MINA53 
JmjC domain. Phases of the substrate complex structures were solved by MR using 
apo structures (MINA53 (PDB accession 4BU2), NO66 (PDB accession 4DIQ); 
Supplementary Tables 2-4). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Recombinant protein production and enzyme assays. Complementary DNA 
sequences encoding N-terminally truncated MINA53 (amino acids 26-465) and 
NO66 (amino acids 183-641) were PCR amplified from the Mammalian Gene 
Collection (MGC; accession numbers BC014928 and BC011350, respectively) and 
cloned into pNIC28-Bsa4 vector. Full-length EcYcfD was cloned into pET-28a(+) 
vector (Novagen) as previously described*. The RmycfD gene (NCBI gene acces- 
sion number 8566662) was amplified by PCR from genomic DNA of R. marinus, 
and was cloned into pGEM-T Easy Vector and then into pET-28a(+). Stratagene’s 
QuickChange site-directed mutagenesis kit was used to make all ROX mutations 
using the above constructs as templates. 

Wild-type ROX enzymes/variants were produced as native Hisg-tagged pro- 

teins in E. coli BL21(DE3) as described®. For crystallization experiments, seleno- 
methionine (SeMet)-derivatized enzymes, SeMet-MINA53 and SeMet-EcYcfD, 
were produced in E. coli BL21(DE3)-R3-pRARE2 and BL21(DE3) strains, respec- 
tively. In general, cells were grown in Le Master media*’ (alternatively in SelenoMeth- 
ionine Medium Base plus Nutrient Mix) supplemented with SeMet (40-50 mg ml ') 
and kanamycin (30 pg ml — ') at 37 °C (while shaking at 200 r.p.m.) until an optical 
density at 600 nm (ODg¢00 nm) of 1.2 (SeMet-MINA53) or 0.6 (SeMet-EcYcfD) was 
reached. Protein expression was then induced with 0.2 mM (SeMet-MINAS53) or 
1.0 mM (SeMet-EcYcfD) isopropyl-fB-p-thiogalactoside (IPTG) and allowed to con- 
tinue for 18 hat 18 °C. All native/SeMet-derivatized proteins were purified from cell 
lysates using immobilized Ni’* affinity chromatography with gradient elution using 
imidazole and/or ion-exchange chromatography. For YcfDs, imidazole was removed 
by buffer exchange to 50 mM HEPES-Na pH 7.5 using a PD10 desalting column 
followed by a further purification using Q-Sepharose HP (EcYcfD) or SourceQ 16 
(RmYcfD) anion exchange chromatography. For MINA53 and NO66, the His, tag 
was removed by incubation with TEV protease followed by a final-step purification 
using size-exclusion chromatography in 50 mM HEPES-Na pH 7.5, 500 mM NaCl, 
5% (v/v) glycerol, 0.5 mM tris(2-carboxyethyl)phosphine (TCEP). Proteins were 
concentrated to 10-30 mg ml ' and were of >95% purity, as determined by SDS- 
PAGE. All columns were supplied by GE Healthcare. Assays were performed as 
described’. 
Crystallization, data collection and processing. Crystals of MINA53, NO66, EcYcfD 
and RmYcfD complexes were grown as described in Supplementary Table 1. In 
general, crystals were cryoprotected by transferring to a solution of mother liquor 
supplemented with 20% (v/v) ethylene glycol (MINA53/NO66) or 25% (v/v) glycerol 
(YcfDs) before being cryo-cooled in liquid nitrogen. 

As described in Supplementary Tables 2-4, data on native and SeMet-derivatized 
crystals were collected at 100 K using synchrotron radiation at the Swiss Light 
Source (SLS) beamline X10SA, European Synchrotron Radiation Facility (ESRF) 
beamline BM16 and Diamond Light Source (DLS) beamlines. The data were pro- 
cessed as outlined in Supplementary Tables 2-4. 

Structure solution and refinement 

MINAS3 structures. SHAKE-AND-BAKE” was used to identify five Se positions 
in the SeMet-MINA53-NOG data set (P4332 space group); refinement of heavy 
atom parameters and phasing was carried out with SHARP” using the single isomor- 
phous replacement with anomalous scattering (SIRAS) method with MINA53-NOG 
(native) as the native and SeMet-MINAS53 as the derivative data set (Supplementary 
Table 2). The electron density map after density modification with SOLOMON™ was 
of good quality; automated model building with ARP/wARP resulted in a >80% 
complete model with one MINA53 molecule per asymmetric unit, which corre- 
sponds to an unusually high solvent content of ~75%. Refinement was carried out 
with BUSTER” and after several cycles of manual rebuilding with COOT”, the 
model converged to 19.7% Reryst and 22.9% Rgee- Atomic coordinates and struc- 
ture factors for this structure are deposited in the PDB database under the acces- 
sion number 2XDV. 

SeMet-MINA53-20G structure was solved by using phases from a highly redun- 
dant single-wavelength anomalous dispersion (SAD) data set collected around 
the Se absorption edge. Using Patterson seeding and dual-space direct methods, 
SHELXD (SHELXCDE pipeline*’/CCP4 suite*) located six out of eight possible Se 
sites. Refinement of substructure solution followed by density modification with 
SHELXE” resulted in good-quality initial phases to 2.8 A resolution. Automated 
model building with BUCCANEER” resulted in a model where core regions includ- 
ing the JmjC and dimerization domains were built. Iterative refinement using CNS 
1.3 (ref. 40) and model building using COOT” continued until Rr. was around 
30%. Final rounds of manual fitting using COOT* and refinement using a com- 
bination of CNS 1.3 (ref. 40) and PHENIX™ continued until Reyst/Rgree no longer 
improved (Supplementary Table 2). This structure (deposited in the PDB under 
accession number 4BU2) was then used as a search model to solve the structure of 
MINA53(Y209C) in complex with RPL27A(G37C) by molecular replacement (MR) 
with PHASER” (P2,2;2; space group, resolution 2.05 A). The quality of all MINA53 
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structures was validated using MOLPROBITY* with >95% of the residues in the 
favoured region of the Ramachandran plot. 

NO66 structures. An N-terminally truncated form of MINA53 (amino acids 30-260), 
comprising the JmjC domain, was used as a search model for MR using PHASER”. 
The two molecules in the asymmetric unit of NO66 were readily located, but the 
electron density away from the JmjC core of NO66 was ambiguous. Density modi- 
fication with RESOLVE”, as implemented in PHENIX”, which took advantage of 
the two-fold non-crystallographic symmetry (in a P22;2 space group), led toa marked 
map improvement and allowed automated model building with BUCCANEER”. 
Refinement was carried out with REFMAC"; after several cycles of manual rebuild- 
ing with COOT”, the model converged to 18.5% Reryst and 23.1% Réree. Atomic 
coordinates and structure factors for this structure are deposited in the PDB under 
accession number 4DIQ. The remaining NO66 structures, including those in com- 
plex with substrate RPL8, were solved in P2; or C2 space groups (resolution 2.15- 
2.50 A) with 2-4 molecules per asymmetric unit (Supplementary Table 3) using the 
NO66/P2;2;2 structure (PDB accession 4DIQ) as a search model. Iterative rounds 
of model building using COOT” and refinement using PHENIX" and/or CNS 1.3 
(ref. 40) were performed until the decreasing Reyst and Réree no longer converged 
(Supplementary Table 3). All residues were in acceptable regions of Ramachandran 
plots as calculated by MOLPROBITY™. 

YcfD structures. SOLVE was used to locate 17 out of 22 possible Se sites using the 
SeMet-EcYcfD data set. Eight pairs of sites were related by non-crystallographic sym- 
metry. The initial electron density map after solvent flattening density modification 
with RESOLVE was of good quality and automated model building resulted in a 
model where core regions (60% of residues in the crystallized protein’s sequence) of 
both molecules in the asymmetric unit were built. Refinement and fitting cycles were 
performed using PHENIX” and COOT® that converged to a final 19.5% Rays and 
Réree 25.0%. Phasing and refinement statistics are summarized in Supplementary 
Table 4. Structures of RmYcfD in complex with 1-chloro-4-hydroxyisoquinoline- 
3-carbonyl) glycine (IOX3) (ref. 46) or substrate L16 were solved by MR using the 
EcYcfD structure as the search model. The structural refinement was carried out with 
PHENIX with iterative rebuilding of the models using COOT until Reryst/Reree Con- 
verged to final values (Supplementary Table 4). 
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Extended Data Figure 1 | Schematic protein topologies of ROXs and related 
20G-dependent oxygenases. a-f, Protein topologies of MINA53-Mn-20G- 
RPL27A(32-50) (a), NO66-Mn-NOG-RPL8 (205-224) (b), RmYcfD-Mn-NOG- 
L16(72-91) (¢), FIH-Fe-NOG-HIF-107g6_g26) (PDB accession 1H2K) (d), 
PHF8-Fe-NOG-H3K4me3K9me2(2_25) (PDB accession 3KV4) (e) and 
KDM4A-Ni-NOG-H3K9me2(7_;4) (PDB accession 2OX0) (f) (substrates are 
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— B-hairpin 
DSBH/ JmjC 
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FIH 
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not shown). DSBH core elements, labelled BI-BVIIL are in green, helices in 
cyan, additional B-strands in red, random coils in black and the insert between 
the fourth and fifth B-strands in blue. Note that not all the DSBH oxygenases 
maintain antiparallel hydrogen-bond pairing between BII and BVII even 
though the #/w angles (BID) are within the B-region of the Ramachandran plot. 
Figures were generated using TopDraw”. 
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Extended Data Figure 2 | ROX dimerization domains. a, Comparison of the 
dimerization domains in ROXs and FIH. b, Intermolecular interactions 
observed at dimerization interfaces (monomer A, grey; monomer B, yellow). 
Validation of the functional relevance of the ROX dimers comes from 
biochemical and kinetic studies demonstrating loss of activities with most 
variants. The dimer interfaces in the ROXs are related to that of FIH; we 
propose that the FIH dimerization fold evolved from that of ROXs'’“*. The 
large buried surface area (>3,000 A?) within all ROX dimerization domains is 
sufficient for dimerization in solution, as reported for NO66 (ref. 49). The 
interactions observed in dimerization include both hydrogen bonds/ 
electrostatic interactions and hydrophobic interactions. In the EcYcfD/ 
RmYcfD dimerization domains, residues involved in hydrophobic interactions 
are mainly from «2 and are well conserved (RmYcfD residues in parentheses): 
Phe 214 (Met 223), Val 242 (Ile 250), Met 247 (Leu 255), Leu 250 (Ile 258), 
Met 253 (Leu 261), Met 254 (Leu 262), Leu 257 (Leu 265), Ile 258 (Ile 257). 
Hydrogen bonding/electrostatic interactions are more important in RmYcfD 
dimerization than in EcYcfD/human ROXs. The network of hydrogen bonds 
between the two RmYcfD monomers A and B includes Asp 256,4—Arg 2693- 
Gln 259,—Asp 267,—Arg 263,, which, owing to two-fold symmetry, creates a 
total of eight hydrogen bonds. In EcYcfD, Leu 255 (Arg 263 in RmYcfD) is 
positioned at the centre of the equivalent network. Furthermore, in RmYcfD 
Gln 216 is positioned to hydrogen bond with the backbone amide N of Arg 234 
and the carbonyl O of Leu 261. Hydrogen bonding in EcYcfD dimerization is 
less extensive, with only the Asn 226 amide N positioned to form a hydrogen 
bond to the hydroxyl group O of Thr 207 and Arg 208 hydrogen bonding with 
the carbonyl O of Gly 224. However, hydrophobic/aromatic clusters are 
involved in EcYcfD dimerization, including by the side chains of Leu210,, 
Leu 223,, Tyr217, (a1), Phe 264,, Trp 267,, Phe 268, and Phe 271, (03) 
from monomer A and Val 242:, Met 247z, Leu 250g (a2) from monomer B. As 
in the YcfDs, in NO66 there is only one apparent salt-bridge interaction at the 
dimer interface, that is, between Arg 474 and Asp 495 (Arg 474, NH1- 

Asp 495, O61, 2.9 A; Arg 474, NH2-Asp 4953 002, 2.7 A), which links the 02 
and «3 helices of opposite monomers. Similarly a “complex salt bridge’ is 
observed in MINA53 between Arg 313 and Glu 320/Asp 317 (Arg 313, 
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NHI1-Glu 320, O¢2, 3.2 A; Arg 3133 NH2-Glu 320 Oel, 2.7 A; Arg 313, NH2- 
Asp 317 O01, 2.9 A) that connects the «2 helices of different monomers. 
Backbone amide hydrogen bonding additionally occurs between the NO66 
residues Asn 426 and Leu 454, Arg 452 and Trp 428, Phe 450 and Gly 429. 
MINAS3 also has backbone-to-side-chain interactions between residues from 
flexible loops connecting «1-2 and «2-a3 helices (Gln 297g O-Lys 331, NG, 
a0 A; Ser 3003 Oy-Glu 324, O, 3.1 A). The role of hydrophobic/aromatic 
clusters in dimerization is apparent in NO66 where the «2 helices from 
different monomers are further apart when compared with those of YcfDs and 
MINAS3 and hence have less buried surface area. However, in NO66, an 
apparent hydrophobic cluster forms between the N-terminal part of «1 and the 
C-terminal part of 72. NO66 Trp 428 (Trp 264 in MINAS3) is positioned at the 
start of the «1 helix of monomer A and forms the centre of a hydrophobic 
cluster, interacting with residues Phe 4314, Ile 435, and Leu 432, on monomer 
A, and Val 481, Leu 484g, Met 462, Phe 477, and Pro 455g on monomer B. 
NO66 Trp 428 also forms an apparent cation-7 interaction with residue 

Lys 480. The similarly positioned Trp 264 in MINA53 maintains hydrophobic 
contacts with Phe 267 and Leu 268 of the same monomer and with Ile 290, 
Pro 291 and Leu 294 of the other monomer, in addition to a cation-7 
interaction with Arg 307. Other hydrophobic contacts observed in MINA53 
dimerization involving the «1 and «2 helices of different monomers include 
between the side chains of residues Leu 308/«2 (interacting with Leu 319/02 
and Phe 267, Leu 268, Thr 271 of «1), Leu 312/02 (interacting with Ile 272/o1 
and Leu 315/02) and Phe 277/a1 (interacting with Val 276, Leu 269 and Ile 272 
of #1). Disruption of ROX dimerization leads to loss of activity, as observed for 
MINA53(R313E) and EcYcfD(I211R) variants as well as for truncated 
MINAS53 (1-265, 1-299) without dimerization and the C-terminal domains. 
Non-denaturing gel electrophoresis was used to investigate ROX 
oligomerization states in solution, which demonstrates disruption of 
dimerization in EcYcfD(1211R) and MINA53(R313E). The loss of activity via 
destabilizing ROX dimerization is reminiscent of similar roles of FIH 
dimerization in catalysis (An FIH(L340R) variant that was predominantly 
monomeric is inactive)°°. Data show mean and standard error of the mean 
(s.e.m.) (n = 3). 
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Extended Data Figure 3 | Interaction of the ROX C-terminal WH domains 
with their respective ribosomal protein substrates. a—e, The figure shows 
how ROX C-terminal domains interact with their substrates. A DALI search”! 
indicates that a close structural homologue of the ROX C-terminal domain is 
the ‘peptide clamp’ (WH) domain of MccB, an enzyme involved in the 
biosynthesis of the microcin C7 antibiotic’. WH domains, a subtype of the 
helix-turn-helix (HTH) family, are nucleic acid/protein-interacting domains 
and occur in different cellular pathways, from transcriptional regulation to 
RNA processing'*. Although the overall negative charge of ROX WH domains 
suggests that they may not directly interact with nucleic acids, it is notable that 
the prokaryotic ribosomal proteins L6, which is located proximal to L16 in 
intact ribosomes”, and the transcriptional regulator PhoP contain WH folds™; 
the latter is interesting because in the E. coli K12 genome the ycfD gene is 
located adjacent to those for the PhoP/PhoQ two component signalling system, 


b 
MccB.iViccA. (PDB: 3H9G) 
MccB.ivig.ATP (PDB: 3H5N) 


MecA 
p4 
2 ATP, 
O13 
N 4 
MccB WH & 
a MMINAS3/RPL27A (1: 100) a0 
6 : MMINAS3/RPL27A (1: 10) 
70 2000 


1500 


1000 


20 
500 
10 


% Hydroxylation 
kcat/Km (M-'s"t) 


0) —_— 
WT M405A WT M405A 

e M NOG6/RPLS8 (1: 20) 

_ M@ NO66/RPLS (1: 10) 

70} 600 

60 500 


% Hydroxylation 
& 
Tet 
Kcat/Km (M's ') 
S 


0 @ 
WT Y577A WT Y577A 
90 
80 
70 
< 60 
2 
g 50 
$ 
5 40 
> 
= 30 
& 
20 | 
10 
0 =a [Be 
WT H277C 


which is involved in stress responses**. a, General topology of the C-terminal 
WH domain showing two distinct binding sites for L16 (yellow) and RPL27A 
(magenta)/RPL8 (orange) involving residues either from an N-terminal loop 
connecting the WH and dimerization domains (as in RmYcfD) or from an 
extended loop between WH £3-[4 (as in human ROXs). b-e, Comparisons 
between the WH domains in MccB (b), MINA53 (c), NO66 (d) and RmYcfD/ 
EcYcfD (e) showing the interactions observed between this domain and the 
substrate(s). Note that although both the RPL27A and RPL8 substrates make 
hydrophobic contacts with the WH domains in MINA53 (Met 405 and 

Met 406) (c) and NO66 (Val 576 and Tyr 577) (d), RmYcfD uses Arg 285 to 
form a hydrogen bond with the L16 Met 83 (RmYcfD Arg 285 NH2-L16 
Met 83 O, 2.5 A) (e). Right panels show the partial loss of activity with 
mutations of MINA53 (M405A), NO66 (Y577A) and EcYcfD (H277C) 
residues from WH domains. Data show mean and s.e.m. (n = 3). 
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Extended Data Figure 4 | Comparison of 20G/co-substrate binding in 
ROXs and representative 20G-dependent oxygenases. The identity of the 
basic residue (Arg or Lys) that binds the 20G C5 carboyxlate via electrostatic 
interactions is indicated along with which of the eight DSBH (I-VIII) strands it 
is located on. The occurrence and positioning of the basic Arg/Lys is 
characteristic of each subfamily'*'*. 20G binding also involves other polar 
residues including alcohols, that is, a Ser (BVIIL, part of the RXS motif as present 
in, for example, DAOCS, ANS, FTO and algal P4H) or Thr (BII, for example, 
as in some KDMs: JMJD3, JMJD6, PHF8 and UTX) or Tyr (non-DSBH 


B-strand, for example, as in FIH, KDM4A, ABH2 and PHD2) and sometimes, 
water molecule(s) (reviewed in refs 15,56,57). In an analogous position to 
the serine of the RXS motif (BVIII), human ROXs have histidine residues, 
MINAS3 His 253/NO66 His 417 (BVIID), which form part of a hydrogen-bond 
network involving MINA53 Thr 255/NO66 Thr 419 (BVIII), a water molecule, 
and the 20G carboxylates. Although EcYcfD/RmYcfD has Asn 197/Thr 206 
at this position (BVIII), it is the conserved serine from BI (114 in EcYcfD and 
122 in RmYcfD) that is positioned to hydrogen bond with the 20G C5 
carboxylate. 
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Extended Data Figure 5 | Human ROX-substrate complexes showing 
disulphide crosslinking sites and difference electron density for the 
substrate residues. a, Strategy adopted to obtain the crosslinked structures 
(the same strategy can be used for other protein hydroxylases/KDMs). 

b-d, Different disulphide crosslinking sites (red arrows) that form 
NO66-RPLS8 cysteine-disulphide pairs under equilibrating conditions. 
Analyses of the 20G-oxygenase-substrate complexes reveal that substrate 
residues at +2 positions relative to the hydroxylated residues make interactions 
with enzyme residues within a ~12 A radius of the metal. To obtain stable 
NO66-RPL8 complexes, we engineered NO66 variants substituting Cys 
residues within ~12 A radius of the metal at positions considered likely 

to be involved in substrate binding based on the analyses of other 
20G-oxygenase-substrate structures””*”* and the evolutionary/phylogenetic 
analyses of NO66/NO66-like proteins in eukaryotes. We also substituted Cys 
residues at +2 positions on the peptide substrate sequence, relative to the 
hydroxylated residue. Electrospray ionization—mass spectrometry (ESI-MS) 
assays were used to identify the best crosslinking yields for the NO66-RPL8 
pairs under equilibrating conditions. The following crosslinked pairs were used 
for crystallization: wild-type NO66 with RPL8(G220C), a double NO66 variant 
L299C/C300S with RPL8(G220C), and a single NO66 variant $373C with 
RPL8(G214C). Structures were obtained for wild-type NO66-RPL8(G220C) 
(complex 1; b), NO66(L299C/C300S)-RPL8(G220C) (complex 2; ¢), and 
NO66(S373C)-RPL8(G214C) (complex 3; d) in combination with 
NOG/Mn(II) in C2 space group, 2.25-2.50 A resolution with two molecules per 
asymmetric unit; RPL8 residues 215-223 (complex 1), 213-223 (complex 2) 
and 212-223 (complex 3) were observed bound to the NO66 active site. 

e, Superimposition of the three complex structures. Note that the key RPL8 
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residues (215-219), including the hydroxylated His 216, are observed in near 
identical conformations (r.m.s.d. 0.29-0.36 A for Ca atoms); the similarity 

of the substrate positions in all the three NO66 structures suggests that they all 
probably represent functional complexes. On the basis of the NO66-RPL8 
structures, we identified a MINA53 residue, Y209C, suitable for crosslinking, 
which we crystallized in complex with RPL27A(G37C) (g). Fo — F- omit 
electron-density maps contoured at 30 are shown as green (RPL8) and grey 
(RPL27A) meshes around the substrate residues. To test whether the wild-type/ 
mutant enzymes and altered substrates still function catalytically we carried out 
endpoint and time-course assays using variable enzyme-to-substrate ratios. 

f, h, The biochemical data show that for both wild-type NO66 (f) and MINA53 
(h) (wild type and Y209C), all the Cys-substituted peptides function as 
substrates. In the case of MINA53, the Y209C variant with which we obtained 
the MINA53-RPL27A complex structure is approximately fourfold more 
active than wild-type MINA53. Data are mean and s.e.m. (n = 3). We also 
tested wild-type NO66 for reaction between enzyme cysteines and the cysteines 
of modified substrate peptides by ESI-MS. Despite testing multiple 
combinations, we only observed disulphide formation in cases where we were 
also able to obtain crystal structures for substrate complexes. All possible 
combinations of human ROX wild type or variants and the peptides containing 
Cys at variable positions were used for the cross-reactivity tests: NO66: wild 
type, R297C, L299C/C3008, $373C, $421C; RPL8: wild type, G214C, H218C 
and G220C; MINAS3: wild type and Y209C; RPL27A: wild type and G37C. 
The combined activity and MS analyses suggest that in order to form 
stable/crystallizable cross-linked complexes, the substrates need to be 
recognized by the enzyme active sites in a catalytically relevant manner (a). 
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Extended Data Figure 6 | Mutagenesis analyses of the substrate-binding MINAS3 Asp 333, respectively. We produced variants of all these residues to 


residues located on the JmjC catalytic domains of MINA53, NO66 and investigate their roles on substrate binding. The results of the endpoint assays as 
RmYcfD. a-c, MINAS53 (a), NO66 (b) and RmYcfD (c) are shown in well as kinetic studies on the variants (right panels) show that substitution of 
colour-coded sticks. Left panels show views from the active sites of these residues causes substantial losses of activity. c, In the case of RmYcfD, the 


ROX-substrate complexes and the right panels show the effects of mutations —_ hydroxylated residue L16 Arg 82 binds in a hydrophobic cleft lined by RmYcfD 
on ROX catalysis. Data are mean and s.e.m. (n = 3). Analyses of ROX-substrate Tyr 137 and RmYcfD Met 120 side chains and hydrogen bonds to RmYcfD 
complexes reveal important interactions between ROX and their ribosomal Asp 118 and RmYcfD Ser 208. To test the crystallographically observed binding 
protein substrates. With human ROXs, the binding of ribosomal RPL27A mode, variants of RmYcfD residues (Asp 118, Met 120, Tyr 137 and Ser 208, 
His 39 (light blue)/RPL8 His 216 (orange) involves a series of hydrogen highlighted) were prepared in EcYcfD (corresponding to Asp 110, Met 112, 
bonds to backbone amides and the side chains of MINA53/NO66 residues: Tyr 129 and Ser 199, respectively). Mutagenesis studies on all ROXs support the 
MINAS3 Gln 136/NO66 Arg 297, MINA53 Asn 165/NO66 Asn 326, MINA53___crystallographically observed binding modes of the substrate residues. 

Tyr 167/ NO66 Tyr 328 and MINAS3 Ser 257/NO66 Ser 421. In addition, in The combined biochemical and structural data also provide insights into the 
the MINA53-RPL27A complex, Leu 38 and Arg 42 of RPL27A make substrate selectivity of ROXs over other oxygenases. 

hydrophobic contacts with MINA53 Leu 176 and a salt-bridge interaction with 
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Extended Data Figure 7 | Conformational changes on substrate binding in 
ROX. a-c, Conformational changes at the domain and residue levels in 
MINAS3 (dark salmon and red with/without RPL27A, light blue) (a), NO66 
(slate and cyan with/without RPL8, orange) (b) and RmYcfD (grey and split pea 
with/without L16, yellow) (c). Although the overall movement observed for 
the C-terminal WH domain on substrate binding is more significant in 
MINAS3 as compared to other ROXs, the RmYcfD structures with and without 
substrate show marked local changes in the side chains of substrate-binding 
residues (see below). a, The inset highlights local changes to the active-site 
region in MINAS3 in the presence (green sticks) or absence (yellow sticks) of 
substrate; MINA53 uses an acidic residue, Asp 333, located on an o-helix 
connecting the dimerization and WH domains, to form a catalytically 
important salt-bridge interaction with RPL27A Arg 42. Support for this 
statement comes from activity analyses on variants of both RPL27A and 


WH domain 
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MINAS3. We have previously reported that a mutation of Arg 42 in RPL27A to 
Ala results in <5% hydroxylation’. The D333A variant of MINA53 ablates 
hydroxylation (almost completely) of native RPL27A in all tested 
substrate:enzyme ratios (Extended Data Fig. 6). In the substrate-unbound form, 
MINAS3 Asp 333 has two alternative conformations, indicating flexibility. The 
NO66 substrate RPL8 has an Ile 219 at the analogous position to Arg 42 of 
RPL27A that makes hydrophobic contacts with the Tyr 577 side chains from 
the WH domain of NO66 (b). In the case of RmYcfD, the substrate-interacting 
residues located on the BII-BIII loop (Tyr 137), the BIV-BV insert (Arg 169), 
the dimerization domain (Arg 212 and Glu 218) and on the loop connecting 
the dimerization and WH domains (Arg 284) are observed in different 
conformations in the structures with and without substrate, probably reflecting 
induced fit on substrate binding (c). Substitutions of these residues have 
variable effects on ROX catalyses (Extended Data Fig. 6). 
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Extended Data Figure 8 | Comparison of YcfDs from E. coliand R. marinus. 
a-d, Differences between YcfDs from E. coli (green) and R. marinus (grey) are 
shown. a, Superimposition of EcYcfD and RmYcfD-L16 complex structures 
showing crystallographically observed differences, particularly in the 
dimerization and BIV-fvV loop regions. The BIV-BV insert is highlighted in 
crimson red and pink in EcYcfD and RmYcfD, respectively. b, Residue 
numbering is according to RmYcfD, with the EcYcfD numbering shown in 
brackets. Note that all of the directly identified substrate-binding residues are 
strictly conserved between EcYcfD and RmYcfD. However, some residues, 
particularly those located on the BIV-BV insert including Asp 118, Tyr 137 and 
Arg 212 in RmYcfD (Asp 110, Tyr 129 and Arg 203 in EcYcfD), are observed in 
different conformations, suggesting potential roles for these residues in 
catalysis. c, d, Predicted binding mode of L16 (yellow) to EcYcfD (green). A 


EcYcfD-L16 (model) 


model complex of EcYcfD with Mn(II), NOG and L16 (residues Pro 77-Lys 84) 
was generated using EcYcfD-SeMet as the template and by comparison 

with RmYcfD-L16 and MINA53-RPL27A (32-50) structures. d, Surface 
representations of the EcYcfD-Mn-NOG-L16,77_s4) complex, predicting 
key hydrogen-bond/polar interactions (dotted lines) with L16. The 
hydroxylated L16 Arg 81 is predicted to bind in a pocket defined by the Tyr 129 
and Met 112 sidechains, which probably form 2-cation and hydrophobic 
interactions with the L16 Arg 81 side chain, as observed in the RmYcfD-L16 
crystal structure. The Arg 81 guanidino group is predicted to make electrostatic 
interactions with the EcYcfD Asp 110 carboxylate and hydrogen bonds to 
EcYcfD Ser 199. EcYcfD residues Asp 110, Met 112, Tyr 129 and Ser 199 

were substituted to test the predicted mode of binding; the assay results are 
given in Extended Data Fig. 6c. 
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Extended Data Figure 9 | Comparison of active-site chemistry of ROXs substrate binding through the active site. Red/blue arrows indicate 
and related enzymes. The figure compares active-site chemistry in hydroxylation/demethylation sites. The active-site metals (Fe/Fe surrogates, 


representative 20G-dependent oxygenases and directionality of the peptide Mn or Ni) are in colour-coded spheres. 
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HSPBAP41 FIH 


JMJD1B 

Gene Protein name UniProtKB 

accession 

number 
FBXL11/JHDM1A/KDM2A F-box and leucine-rich repeat protein 11 Q9Y2K7 
FBXL10/JHDM1B/KDM2B F-box and leucine-rich repeat protein 10 Q8NHM5 
JMJD1A/KDM3A Jumonji domain containing 1A QgY4c1 
JMJD1B/KDM3B Jumonji domain containing 1B Q7LBC6 
JMJD1C/JHD2C Jumonji domain containing 1C Q15652 
JMJD2A/JHDM3A/KDM4A Jumonji domain containing 2A 075164 
JMJD2B/KDM4B Jumonji domain containing 2B 094953 
JMJD2C/KDM4C Jumonji domain containing 2C Q9H3RO 
JMJD2D/KDM4D Jumonji domain containing 2D Q6BOI6 
JMJD2E/KDM4E Jumonji domain containing 2E B2RXH2 
JARID1A/RBBP2/KDM5A Jumonji, AT rich interactive domain 1A P29375 
JARID1B/KDM5B Jumonji, AT rich interactive domain 1B Q9UGL1 
JARID1C/KDM5C Jumonji, AT rich interactive domain 1C P41229 
JARID1D/KDM5D Jumonji, AT rich interactive domain 1D Q9BY66 
UTX/KDM6A Ubiquitously transcribed tetratricopeptide repeat protein, chromosome X 015550 
UTY Ubiquitously transcribed tetratricopeptide repeat protein, chromosome Y 014607 
JMJD3/KDM6B Jumonji domain containing 3 015054 
K1AA1718/ JHDM1D/KDM7 Lysine-specific demethylase 7 Q6ZMT4 
PHF8/KIAA1111 PHD finger protein 8 Q9UPP1 
PHF2/JHDM1E PHD finger protein 2 075151 
JMJD4 Jumonji domain containing 4 Q9H9VI9 
JMJD5 Jumonji domain containing 5 Q8N371 
JMJD6 Jumonji domain containing 6 Q6NYC1 
JMJD7 Jumonji domain containing 7 POC870 
JMJD8 Jumonji domain containing 8 Q96S16 
JARID2/JMJ Jumonji/ARID domain-containing protein 2/ Protein Jumonji Q92833 
TYW5 tRNA wybutosine-synthesizing protein 5 A2RUC4 
FIH Factor inhibiting HIF QONWT6 
HSPBAP1/PASS1 HSPB1-associated protein 1 Q96EW2 
Hairless Protein hairless 043593 
MINA53 MYC induced nuclear antigen Q8IUF8 
NO66 Nucleolar protein 66 Q9H6EW3 


Extended Data Figure 10 | Phylogenetic relationships of human JmjC sequences of human JmjC-containing 20G-dependent oxygenases showing 
20G-dependent oxygenases. The figure shows a parsimony tree constructed _ that distinct branches of JmjC-containing oxygenases exist for hydroxylases 
using Archaeopteryx v.0.9812 (ref. 58) from ClustalW aligned protein (red), demethylases/hydroxylases (light green) and demethylases (blue). 
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Co-opting sulphur-carrier proteins from primary 
metabolic pathways for 2-thiosugar biosynthesis 


Eita Sasaki’, Xuan Zhang’, He G. Sun*, Mei-Yeh Jade Lu*®, Tsung-lin Liu®®, Albert Ou®, Jeng-yi Li*, Yu-hsiang Chen‘, 


Steven E. Ealick? & Hung-wen Liu'* 


Sulphur is an essential element for life and is ubiquitous in living 
systems’. Yet how the sulphur atom is incorporated into many 
sulphur-containing secondary metabolites is poorly understood. 
For bond formation between carbon and sulphur in primary meta- 
bolites, the major ionic sulphur sources are the persulphide and thio- 
carboxylate groups on sulphur-carrier (donor) proteins**. Each group 
is post-translationally generated through the action of a specific acti- 
vating enzyme. In all reported bacterial cases, the gene encoding the 
enzyme that catalyses the carbon-sulphur bond formation reaction 
and that encoding the cognate sulphur-carrier protein exist in the 
same gene cluster’. To study the production of the 2-thiosugar moiety 
in BE-7585A, an antibiotic from Amycolatopsis orientalis, we iden- 
tified a putative 2-thioglucose synthase, BexX, whose protein sequence 
and mode of action seem similar to those of ThiG, the enzyme that 
catalyses thiazole formation in thiamine biosynthesis®’. However, 
no gene encoding a sulphur-carrier protein could be located in the 
BE-7585A cluster. Subsequent genome sequencing uncovered a few 
genes encoding sulphur-carrier proteins that are probably involved 
in the biosynthesis of primary metabolites but only one activating 
enzyme gene in the A. orientalis genome. Further experiments showed 
that this activating enzyme can adenylate each of these sulphur- 
carrier proteins and probably also catalyses the subsequent thiola- 
tion, through its rhodanese domain. A proper combination of these 
sulphur-delivery systems is effective for BexX-catalysed 2-thioglucose 
production. The ability of BexX to selectively distinguish sulphur- 
carrier proteins is given a structural basis using X-ray crystallography. 
This study is, to our knowledge, the first complete characterization 
of thiosugar formation in nature and also demonstrates the receptor 
promiscuity of the A. orientalis sulphur-delivery system. Our results 
also show that co-opting the sulphur-delivery machinery of primary 
metabolism for the biosynthesis of sulphur-containing natural pro- 
ducts is probably a general strategy found in nature. 

The unusual sugars that are found in many secondary metabolites 
have crucial roles in determining the efficacy and specificity of the bio- 
logical activities of the parent molecules*”. Despite recent advances in 
research on the biosynthesis of unusual sugars, little is known about thio- 
sugar formation, owing to the rarity of thiosugars in natural products 
and the limited knowledge about sulphur incorporation into second- 
ary metabolites’*’®"'. We studied the biosynthesis of the 2-thiosugar- 
containing antibiotic BE-7585A (A, Fig. la) in A. orientalis subsp. vinearia 
BA-07585 and identified a putative 2-thioglucose-6-phosphate synthase, 
BexX®’. This enzyme has significant sequence homology to the thiazole 
synthase ThiG”, which is responsible for construction of the thiazole 
moiety (8) from 1-deoxy-D-xylulose-5-phosphate (DXP, 6) in thiamine 
biosynthesis'*"* (Fig. 1b). ThiG catalyses sulphur insertion into a ThiG- 
ketosugar adduct (7)'; therefore, BexX may have a similar role in the 
conversion of glucose-6-phosphate (G6P, 1) to 2-thioglucose (5) in 
A. orientalis (Fig. 1a). The proposed function of BexX is supported by 
the detection of a covalent adduct (2) between BexX and a 2-ketosugar 


derived from G6P’. The crystal structure of the BexX-substrate complex 
has now been determined to 2.3 A resolution (Extended Data Fig. 1), 
confirming that G6P is covalent attached to the lysine at position 110 
(Lys 110) of BexX (Fig. 1c). However, the absence of genes encoding 
potential sulphur-transfer enzymes, including common sulphur-carrier 
proteins’®, cysteine desulphurases’®, and rhodanese-like proteins’’, in 
and near the BE-7585A biosynthetic gene cluster impeded further func- 
tional characterization of BexX. 

To search for the sulphur-carrier protein required for the BexX reac- 
tion, the entire genome of A. orientalis was sequenced. A total of 9,210 
coding open reading frames were identified in approximately 9.8 mega- 
bases of genomic DNA, including genes encoding five cysteine desulphu- 
rase homologues, five rhodanese homologues and four sulphur-carrier 
protein homologues (ThiS, MoaD, CysO and MoaD2) (Extended Data 
Table 1). The thiS, moaD and cysO genes are part of the thiamine, 
molybdopterin and cysteine biosynthetic gene clusters in A. orientalis, 
respectively'*”, whereas moaD2 stands alone with no nearby genes 
associated with a biosynthetic pathway (Extended Data Fig. 2a—-d and 
Supplementary Table 1). Although the protein receptor for MoaD2 is 
not immediately apparent, MoaD2 has high sequence homology to MoaD 
and therefore probably functions as a MoaD homologue. In view of 
the sequence similarity between BexX and ThiG and their mechanistic 
parallels’, we anticipated that ThiS, being the cognate sulphur-carrier 
partner of ThiG’*"’, might be recruited for sulphur delivery to the BexX- 
G6P complex (2) in A. orientalis. 

Unlike the thiamine biosynthetic gene clusters in Escherichia coli 
and Bacillus subtilis'’"*, the gene cluster in A. orientalis does not con- 
tain thiF, the gene that encodes the ThiS-activating enzyme, which is 
essential for converting ThiS to its thiocarboxylate form (10). The cor- 
responding activating enzymes for MoaD and CysO are also missing 
from the respective molybdopterin and cysteine biosynthetic gene clus- 
ters in A. orientalis. To our surprise, only a single putative activating 
enzyme was found in the entire genome of A. orientalis: a MoeZ homo- 
logue with a ThiF-and-MoeB-like domain at the amino terminus and a 
rhodanese homology domain at the carboxy terminus (Extended Data 
Fig. 2 and Supplementary Table 1). Because it is unique in the genome 
and because it is not part of the molybdopterin gene cluster or assoc- 
iated with any other biosynthetic gene cluster, this protein, MoeZ, may 
be the universal activating catalyst for thiocarboxylate protein produc- 
tion in A. orientalis (Fig. 2a). 

To test the proposed function of MoeZ, the ThiS and MoeZ proteins 
of A. orientalis were heterologously expressed in E. coli, each with an 
N-terminal His, tag. When ThiS was incubated with MoeZ and ATP, 
an electrospray ionization—mass spectrometry (ESI-MS) signal corre- 
sponding to adenylated ThiS (9) was detected, together with a few peaks 
probably derived from the reaction of the labile adenylated ThiS with 
buffer components (Extended Data Fig. 3c). On addition of excess bisul- 
phide, complete conversion of 9 to its thiocarboxylate form (10) was 
observed (Extended Data Fig. 3d). Control experiments using bisulphide 
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Figure 1 | Proposed mechanism for 2-thiosugar formation in BE-7585A 
biosynthesis. a, The active site lysine residue (Lys110)°’ of BexX initially forms 
an imine bond with G6P (1) at the C1 position, which is isomerized first to a 
C1-C2 enamine and then a C2-ketone intermediate (2) (the carbon atoms 
are numbered). Subsequent nucleophilic attack by a sulphur donor (red) occurs 
at the C2 position of 2, resulting in the incorporation of a sulphur atom in 
the 2-thio-G6P (2SG6P) (3) product. One arrow indicates one step, and two 
arrows suggest multiple steps between the transformations. b, ThiG-catalysed 
thiazole phosphate biosynthetic pathway. c, Stereo view of BexX active site. 
The active-site side chains and the Lys 110-G6P intermediate are depicted as 
sticks, with the carbon atoms of the residues coloured in green and purple, 
respectively (and nitrogen in blue and oxygen in red). The F, — F, simulated 
annealing omit map of the Lys110-G6P intermediate contoured at 4c is shown 
in grey. Water molecules are shown as red spheres. The carbon atoms of G6P 
are numbered from C1 to C6. The dashed lines represent hydrogen bonding. 
Me, methyl. 


in the absence of MoeZ showed no change in the original ThiS signals. 
These results demonstrated that MoeZ can charge This to its ready-to- 
use thiocarboxylate form (Fig. 2a). The activated ThiS-COS was next 
incubated with the BexX-G6P complex (2). If sulphur transfer occurs 
and the resultant 2-thiosugar product (3) is released from the enzyme, 
a shift in the mass signal corresponding to the BexX-G6P complex to 
that of the free enzyme is anticipated (Fig. 2b). However, no increase in 
free BexX was observed in the presence of ThiS-COS” (Fig. 2c-e), hence 
ruling out ThiS (and bisulphide) as the sulphur donor for BexX in 2- 
thiosugar formation. 
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Figure 2 | Activation of sulphur-carrier proteins and sulphur transfer to the 
BexX-G6P complex. a, MoeZ-catalysed acyl adenylation of a sulphur-carrier 
protein (SCP)—for example, ThiS, MoaD, CysO or MoaD2—followed 
by nucleophilic attack of bisulphide, yielding the corresponding SCP- 
thiocarboxylate. b, Expected sulphur-transfer reaction to BexX-G6P (2) using 
SCP-thiocarboxylate to produce 2SG6P (3), which was further derivatized with 
mBBr to give 2SG6P-bimane (11). c-g, Deconvoluted ESI-MS analyses of 
as-isolated C-Hisg-BexX (where C denotes carboxy terminal) (calculated mass 
(calcd), 28,488), showing that the major species is 2 (28,730 calcd) (c), and 
the sulphur-transfer reactions using ThiS (d)—and a lower mass range of the 
same reaction showing N-His,-ThiS-COSH (8,663 calcd) and its N-gluconoyl 
derivatives (8,841 calcd) (where N denotes amino terminal) (e) (see also 
Extended Data Fig. 3)—CysO (f) or MoaD2 (g). h, HPLC traces (left) for 
BexX-catalysed reactions (right). In the reaction with ThiS, the amount of 
AMP, probably derived from partial decomposition of ATP during incubation, 
was comparable to that in the control with no added SCP. 


To assess the competence of MoaD, CysO and MoaD2 of A. orientalis 
as sulphur-carrier proteins in BexX-catalysed reactions, N-terminally 
His¢-tagged MoaD, CysO and MoaD2 were prepared. Similar to the 
case for ThiS, thiocarboxylation of each protein in the presence of MoeZ, 
ATP and sodium sulphide (NaSH) was confirmed by mass spectro- 
metric analysis (Extended Data Fig. 3f-k). Because activated MoaD 
was generated in small quantities at low purity, only CysO and MoaD2 
were incubated with the BexX-G6P complex. The relative intensities of 
the mass signals corresponding to BexX-G6P (2) and the free enzyme 
were monitored before and after the addition of the activated sulphur- 
carrier proteins. Only signal ascribed to free BexX was discernible after 
treatment with CysO or MoaD2 (Fig. 2f, g). To gain further evidence, the 
thiosugar product (3) was derivatized with monobromobimane (mBBr) 
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before high-performance liquid chromatography (HPLC) analysis to 
yield 11, thereby facilitating detection (Fig. 2b). Indeed, when CysO or 
MoaD2 was used, a new peak (product peak) clearly appeared, together 
with an increase in AMP production (Fig. 2h, traces 3 and 4). The pro- 
duct peak was isolated and characterized as 2-thioglucose-6-phosphate- 
bimane (11) by ESI-MS and NMR spectroscopy (Supplementary Methods). 
Each assay sample was also treated with alkaline phosphatase, and the 
dephosphorylated product matched well with the synthetic standard 
(Extended Data Fig. 4). As expected, no thiosugar product was detected 
in the sample containing ThiS. These results firmly established that BexX- 
catalysed 2-thiosugar formation was able to proceed in the presence of 
either CysO-COS” or MoaD2-COS but not ThiS-COS . These two 
examples clearly reveal the capability of some sulphur-delivery enzymes 
to ‘moonlight’ in natural product biosynthesis, bridging the biosyn- 
thetic pathways of primary and secondary metabolites. 

Thiocarboxylated sulphur-carrier proteins recognize their partners 
through specific protein-protein interactions” ~*. Because BexX has 37% 
sequence similarity to ThiG” and because the two enzymes are struc- 
turally homologous, it was surprising to find that ThiS is not a sulphur- 
carrier protein for BexX. To understand the sulphur-carrier protein 
specificity of BexX, the BexX—CysO structure was determined to 2.6 A 
resolution (Fig. 3a and Extended Data Fig. 1f). We were unable to crys- 
tallize BexX-MoaD2; however, because of the compact ubiquitin-like 
fold and similar sizes of CysO (90 residues) and MoaD2 (96 residues), 
we were able to construct a reliable homology model for BexX-MoaD2 
using the BexX-CysO structure as a template. We also constructed a 
hypothetical model of BexX-ThiS using ThiS from Thermus thermo- 
philus (Protein Data Bank (PDB) ID, 2HTM) as a guide. 

CysO and MoaD2 superimpose well, with a root mean squared devi- 
ation (r.m.s.d.) of 0.1 A for 80 Cx: atoms. By contrast, CysO and ThiS show 
significant differences, especially in the loop regions, with an r.m.s.d. of 
2.7 A for 43 Co.carbon atoms (Fig. 3b and Extended Data Figs 1d, eand5). 
The most significant difference between ThiS (66 residues) and either 
CysO or MoaD2 is the insertion of two additional «-helices, which are 
located at the BexX-sulphur-carrier protein interface (Extended Data 
Fig. 5). Asa result, the amount of accessible surface area buried on com- 
plex formation is ~1,000 A for BexX-CysO and BexX-MoaD2 but 
only ~600 A? for BexX-ThiS (Extended Data Figs 5 and 6a). CysO con- 
tributes 19 residues and BexX contributes 26 residues to the interface 
of BexX—Cys0O, similarly to the 16 residues contributed by MoaD2 and 
the 23 by BexX in BexX-MoaD2. By contrast, only eight ThiS residues 
contribute to the interface in the BexX-ThiS model. Ten of the inter- 
face residues are conserved between CysO and MoaD2, but only four 
of these are conserved in ThiS (Extended Data Fig. 5a). The hydrogen- 
bonding scheme is also conserved between BexX-CysO and BexX- 
MoaD2 (Extended Data Fig. 6b-d). A comparison of the BexX-CysO 
complex and the B. subtilis ThiG-ThiS complex (PDB ID, 1TYG)” com- 
plex provides further insight. Superposition of BexX—CysO and ThiG- 
ThiS results in an r.m.s.d. of 1.7 A for the BexX-ThiG core (Fig. 3c); 
however, CysO and ThiS do not overlay well (r.m.s.d. >40 A). Thus, 
even though the overall sulphur-carrier protein folds are similar, and 
even though each sulphur-carrier protein is positioned to insert its C- 
terminal tail into the active site of its partner (Extended Data Fig. 6e, f), 
the selection of CysO or MoaD2 by BexxX is clearly determined by the 
interface interactions. 

Finally, we also examined whether the C-terminal rhodanese domain 
of MoeZ has a role in sulphur transfer. In a typical rhodanese reaction, 
the conserved cysteine residue in rhodanese (for example, Cys 360 in 
MoeZ) is converted to a persulphide group in the presence of thiosul- 
phate or through the action ofa cysteine desulphurase using L-cysteine 
as the sulphur source’’. Because the resultant persulphide is a known 
sulphur donor’, it can be used to charge the adenylated sulphur-carrier 
proteins to the thiocarboxylate forms (Fig. 4). To test this potential 
second role of MoeZ as a sulphur donor, MoeZ was incubated with 
CysO or MoaD2, first in the presence of ATP and thiosulphate (with 
no addition of reducing agent to prevent bisulphide formation). MoeZ 
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Figure 3 | Structure of BexX-CysO from A. orientalis. a, A ribbon diagram 
of the BexX-CysO heterotetramer generated using two-fold crystallographic 
symmetry. BexX’ and CysO’ are coloured in cyan and light brown, respectively. 
Secondary structural elements of BexX and CysO are coloured in blue and 
pink for o-helices, green and yellow for f-strands, and yellow and green for 
loops, respectively. The C-terminal tail (AVAGG) of CysO is highlighted in red. 
The helices «7 and «8 from BexX and BexX’ are labelled in pink and blue, 
respectively. b, Stereo view diagram of the superposition of CysO (pink) and 
This (yellow). Secondary structural elements are labelled in black for CysO and 
red for ThiS. The two major insertions of CysO, 391 and «2, are highlighted in 
green. c, Comparison of the A. orientalis BexX-CysO dimer and the Bacillus 
subtilis ThiG-ThiS dimer. Monomers are coloured in blue for BexX, pink for 
CysO, grey for ThiG and yellow for ThiS. 


was observed to catalyse the thiolation of both CysO and MoaD2 but 
not when replaced with a MoeZ(Cys360Ala) mutant (Extended Data 
Fig. 7), which retained a similar level of adenylation activity to wild-type 
MoeZ (Extended Data Fig. 8). These observations are consistent with 
the C-terminal rhodanese domain of MoeZ being involved in sulphur 
transfer. Next, we assessed BexX-catalysed 2-thiosugar formation with 
MoeZ and MoaD2 in the presence of ATP, using either thiosulphate 
(Fig. 4a) or L-cysteine anda cysteine desulphurase (CD4, Extended Data 
Table 1) from A. orientalis (Fig. 4b) as the primary sulphur sources. As 
expected, the 2-thioglucose product was detected in both cases in the 
absence of reducing agents (Extended Data Fig. 9). Taken together, 
these results support the probable dual role of MoeZ in catalysing both 
the adenylation and thiolation of sulphur-carrier proteins in A. orientalis. 

In summary, we carried out whole-genome sequencing of A. orientalis 
and demonstrated that the sulphur delivery for 2-thiosugar production 
in the biosynthesis of BE-7585A is achieved by hijacking the sulphur- 
transfer systems from primary metabolism. Although the overall reac- 
tion mechanism of 2-thiosugar formation resembles that of thiamine 
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Figure 4 | Possible involvement of the rhodanese domain of MoeZ in 
thiolation of sulphur-carrier proteins. The C-terminal rhodanese domain 
(RHOD) of MoeZ catalyses thiolation of the adenylated SCP. The sulphur 
source for charging the rhodanese domain can be from thiosulphate (a) or from 
L-cysteine mediated by a cysteine desulphurase (CD) (b). Nucleophilic attack 
(red dashed line) on the adenylated sulphur-carrier protein followed by 
intramolecular disulphide bond formation (with another cysteine residue in 
MoeZ) allows sulphur transfer from the persulphide group to SCP. The protein 
persulphide intermediate can be reduced to release bisulphide, which can 
also attack adenylated sulphur-carrier proteins (blue dashed line). To prevent 
such complications, the experiments were carried out in the absence of 
reducing agents. [Red], reduction; S39, the wild-type Cys 360 residue. 


biosynthesis, BexX cannot utilize the corresponding sulphur-carrier pro- 
tein, ThiS, from the thiamine pathway. Instead, the sulphur-carrier pro- 
teins that are probably involved in cysteine (CysO) and molybdopterin 
(MoaD2) biosynthesis are recruited to transfer their C-terminal thio- 
carboxylate sulphur to the BexX-G6P complex (2). Two structural snap- 
shots, of the BexX-G6P ketone intermediate (2) and the BexX-CysO 
heterotetramer, provide significant insight into the proposed sulphur 
incorporation mechanism, as well as the structural basis by which sulphur- 
carrier proteins are selected. These results indicate that a functional alli- 
ance between a sulphur-carrier protein and its acceptor protein is not 
specific but is not entirely random. The assembly of operational sulphur- 
transfer machinery from components of the sulphur-carrier systems of 
primary metabolism, to deliver a sulphur atom to produce 2-thiosugars, 
is an efficient strategy for the biosynthesis ofa relatively rare metabolite. 
Such an ad hoc approach to sulphur transfer may be a paradigm for as 
yet undiscovered pathways of sulphur-containing natural product bio- 
synthesis. The revelation that MoeZ is the universal activating enzyme 
for all known sulphur-carrier proteins in A. orientalis is another signifi- 
cant finding. The presence of only a single ThiF-type enzyme in the entire 
genome has also been noted in several other microorganisms (Extended 
Data Table 2). The charging of multiple sulphur-carrier proteins in differ- 
ent biosynthetic pathways by a single activating enzyme may be a com- 
mon phenomenon in nature (at least in the Actinomycetales)°”***, In 
addition, the finding that functional pairs of sulphur-carrier proteins and 
their acceptor proteins are not necessarily located in the same gene cluster 
raises the possibility that some cryptic gene clusters in various genomes 
may encode pathways for the biosynthesis of sulphur-containing natu- 
ral products. Such a possibility has generally been overlooked in recent 
efforts to deconvolute genomic information. 


METHODS SUMMARY 


Whole-genome sequencing of Amycolatopsis orientalis was carried out at the High 
Throughput Sequencing Core Facility at Academia Sinica, Taiwan, using a 454 GS 
FLX Titanium analyser (Roche) anda Genome Analyzer IIx (Illumina). Contig exten- 
sion and genome annotation were carried out using Glimmer (Gene Locator and 
Interpolated Markov ModelER) version 3.0 (ref. 25), tRNAscan-SE” and RNAmmer”. 
The thiS, moaD, cysO, moaD2 and moeZ genes were PCR-amplified from A. orientalis 
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genomic DNA‘* and ligated into a pET28b(+) vector (Novagen). The resultant 
plasmids were overexpressed in the Escherichia coli BL21 Star (DE3) strain (Invi- 
trogen) and purified under conditions similar to those previously described for 
preparing BexX’. Each of the purified sulphur-carrier proteins (50-90 [1M) was 
subjected to ESI-MS analysis before and after incubation with 80 uM MoeZ and 
5mM ATP in 100 mM Tris-HCl buffer (pH 8.0) containing 5mM MgCl, in the 
presence or absence of 10 mM NaSH. The corresponding sulphur-carrier protein 
thiocarboxylates generated in situ were incubated with 100 1M BexX and 2mM 
G6P (1) in50 mM NH4HCO; buffer, pH 8.0, at 25 °C for 8 h to yield the 2-thio-p- 
glucose-6-phosphate product (3). The resultant reaction mixture was then added 
to a solution of 5 mM mBBr in methanol to give 11 and subjected to HPLC ana- 
lysis. Crystals of BexX-G6P were grown from 40% (v/v) polyethylene glycol (PEG) 
300, 0.2 M calcium acetate and 0.1 M sodium cacodylate-HCl, pH 6.5, and crystals 
of BexX-CysO complexes were grown from 28% PEG 4000, 0.2 M LiSO, and 0.1 M 
Tris, pH 8.0. Data were collected at the Cornell High Energy Synchrotron Source 
(CHESS) and the Advanced Photon Source (APS). The structures were determined 
by molecular replacement. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Whole-genome sequencing and analysis. Soybean-casein digest (TSB) medium 
(10 ml) was inoculated with spores of Amycolatopsis orientalis and incubated in a 
rotary incubator at 30 °C and 250 r.p.m. for 2 days. The resultant seed culture (4 ml) 
was transferred to 100 ml TSB medium and grown under the same conditions for 
2 days. The growth culture (25 ml) was centrifuged at 5,000g for 20 min at 4 °C, and 
the cells were washed with 25 ml 10 mM EDTA. After another centrifugation, the 
cells were stored at —80 °C until use. The cells were resuspended in 5 ml 1 mM EDTA, 
and the suspension was divided into 1.5 ml tubes (0.4 ml each). The genomic DNA 
was extracted using the PureLink Genomic DNA Mini Kit (Invitrogen) according 
to the manufacturer’s instructions. The resultant DNA solution (0.5 pg pl ', 50 pl 
from each tube) was subjected to massively parallel sequencing using a 454 GS 
FLX Titanium analyser (Roche) and a Genome Analyzer IIx (Illumina) at the High 
Throughput Sequencing Core Facility at Academia Sinica, Taiwan. Primary assem- 
bly was carried out using 454 Newbler software (Roche). Contig extension and the 
closing of short gaps was achieved by scripts built in-house at the core. Genome 
annotation was carried out using Glimmer (Gene Locator and Interpolated Markov 
ModelER) version 3.0 (ref. 25), tRNAscan-SE”® and RNAmmer’’. Homologous pro- 
tein sequences were identified in the NCBI database using the Basic Local Align- 
ment Search Tool (BLAST). 

Preparation of proteins. C-Hiss-BexX (where C denotes carboxy terminal) was pre- 
pared as described previously’. The thiS (orf13974), moaD (orf13839), cysO (orf06461), 
moaD2 (orf10102), moeZ (orf02110) and cd4 (orf04763) genes were PCR-amplified 
from A. orientalis genomic DNA using primers with engineered Ndel and HindIII 
restriction sites. The sequences of the primers are described in the Supplementary 
Methods. The PCR-amplified gene fragments were purified, digested with Ndel 
and HindIII and ligated into a pET28b(+) vector (Novagen) that had been digested 
with the same enzymes. For crystallization studies, the bexX gene was also sub- 
cloned into pET28b(+) and produced as an N-terminally Hisg-tagged protein. In 
addition, the cysO gene was subcloned into an IMPACT pTYB1 vector (New England 
Biolabs) that had been digested with NdeI and SapI for the production of CysO 
thiocarboxylate**. The resultant plasmids were used to transform the Escherichia 
coli BL21 Star (DE3) strain (Invitrogen) for protein overexpression. An overnight 
culture of E. coli transformants grown in 10 ml LB medium containing 50 ig ml 
kanamycin at 37 °C was used to inoculate 1] of the same growth medium. The cul- 
ture was incubated at 37 °C with shaking (230 r.p.m.) until the optical density at 
600 nm (OD¢0o9) reached ~0.5. Protein expression was then induced by the addi- 
tion of isopropyl B-p-thiogalactoside (IPTG) to a final concentration of 0.1 mM, 
and the cells were allowed to grow at 18 °C with shaking at 125 r.p.m. for an addi- 
tional 24h. The cells were collected after centrifugation at 4,500g for 15 min and 
stored at —80°C until lysis. All purification steps were carried out at 4°C using 
nickel (Ni-NTA) resin according to the manufacturer’s protocol. The proteins were 
eluted using 250 mM imidazole buffer containing 10% glycerol, except those for 
crystallization studies. The collected protein solution was dialysed three times 
against 11 50 mM Tris-HCl buffer, pH 8, containing 300 mM NaCl and 15% gly- 
cerol. The protein solution was then flash-frozen in liquid nitrogen and stored at 
—80 °C until use. For crystallization studies, N-His,-BexX (where N denotes amino 
terminal) eluted with 250 mM imidazole buffer was incubated with 2 mM G6P and 
2mM dithiothreitol for 1h at 4°C and then further purified in 10 mM Tris-HCl 
buffer, pH 8.0, containing 50 mM NaCl by using a Superdex G200 column (GE 
Healthcare). In the case of cysO/pTYB1, the cell lysate was loaded onto a column of 
chitin beads (New England Biolabs, 10 ml) at a flow rate of 0.8 ml min !. The col- 
umn was then washed with 15 column volumes of column buffer at a flow rate of 
2 ml min“. Intein-mediated cleavage of the protein was carried out at 18 °C for 
12h with 30 ml 50 mM dithiothreitol to yield CysO or with NagS to yield CysO- 
thiocarboxylate’**. CysO was further purified in 10 mM Tris-HCl buffer, pH 8.0, 
containing 50 mM NaCl by using a Superdex G75 column (GE Healthcare). The 
protein concentration was determined by the Bradford assay using bovine serum 
albumin as the standard”. The molecular mass and purity (>90%, except N-His,- 
MoaD) of the proteins were estimated by SDS-PAGE analyses (Extended Data Fig. 31). 
ESI-MS analyses of proteins. The purified sulphur-carrier protein (that is, N- 
Hisg-ThiS (90 1M), N-Hisg-MoaD (501M), N-Hisg-CysO (601M) or N-Hisg- 
MoaD2 (90 uM)) in 5mM or 100 mM Tris-HCl buffer, pH 8.0, was subjected to 
ESI-MS analysis, which was carried out at the Mass Spectrometry core facility at 
the College of Pharmacy, University of Texas, Austin (Extended Data Fig. 3b, f-h). 
In the case of N-Hisg-This, the protein was also incubated with 80 1M N-Hisg-MoeZ 
and 5mM ATP in 100 mM Tris-HCl buffer, pH 8.0, containing 5mM MgCl, at 
30 °C for 0.5 h (Extended Data Fig. 3c). Additionally, to aliquots of the above solution 
was added 10 mM sodium sulphide (NaSH), and the resultant solution was subjected 
to ESI-MS analysis after incubation at 30 °C for 0.5 h (Extended Data Fig. 3d). As a 
control, a reaction containing only 90 1M N-Hisg-ThiS and 5 mM NaSH in 100 mM 
Tris-HCl buffer, pH 8.0, was similarly analysed (Extended Data Fig. 3e). The other 
sulphur-carrier proteins (that is, N-Hisg-MoaD (50 UM), N-Hisg-CysO (60 uM) 


and N-Hisg-MoaD2 (901M)) were separately incubated with 711M N-Hibse- 
MoeZ, 5mM ATP and 10 mM NaSH in 50 mM Tris-HCl buffer, pH 8.0, contain- 
ing 5mM MgCl, at 30°C for 0.5h and subjected to ESI-MS analysis (Extended 
Data Fig. 3i-k). Finally, each sulphur-carrier protein (that is, N-Hise-ThiS (90 1M), 
N-His,-CysO (60 11M) or N-Hisg-MoaD2 (90 [1M)) was also incubated with 100 uM 
C-His¢-BexX and 7 [1M N-Hisg-MoeZ in 50 mM Tris-HCl buffer, pH 8.0, contain- 
ing 5mM ATP, 10mM NaSH and 5mM MgCl, at 30°C for 0.5h. The resultant 
solution was subjected to ESI-MS analysis (Fig. 2d-g). 

2-Thiosugar formation and its detection. The typical BexX reaction mixture 
(50 pl) contained 100 1M C-Hisg-BexX, sulphur-carrier protein (30 LM N-His¢- 
CysO, 45 uM N-Hisg-MoaD2 or 45 1M N-His¢-ThiS), 15 uM N-Hise-MoeZ, 2 mM 
ATP, 2 mM G6P and5 mM NaSH in 50 mM NH,HCO; buffer, pH 8.0, containing 
5mM MgCl,. The resultant reaction mixture was incubated at 25 °C for 8h and 
stored at —20 °C until analysis. To the reaction mixture (10 pl) prepared above was 
added 10 pl 10 mM mBBr methanol solution (the final concentration of mBBr was 
5 mM), which was incubated at 25°C for 5 min. The mixture was centrifuged at 
16,000g for 5 min to remove the precipitant, and 10 jl supernatant was transferred 
to a new tube. The solution was evaporated in vacuo using a SpeedVac SC100 
(Savant). The resultant residue was redissolved in 100 pl 50 mM NH4HCOs buffer, 
pH 8.0, and subjected to HPLC analysis using a CarboPac PA] analytical column 
(4 X 250 mm; Dionex). The sample was eluted with a gradient of water (solvent A) 
and 1 M ammonium acetate (solvent B). The gradient was run from 5% to 15% B 
over 5 min, from 15% to 30% B over 15 min and from 30% to 100% B over 7 min, 
with a 5-min wash at 100% B, and from 100% to 5% B over 3 min, followed by re- 
equilibration at 5% B for 5 min. The flow rate was 1 ml min™ 1 ‘and the detector was 
set at 260 nm (Fig. 2h). The peak corresponding to the enzymatic reaction product 
was isolated and subjected to ESI-MS and NMR analyses (see Supplementary Methods) 
for structural characterization. Alternatively, the reaction mixture stored at —20 °C 
was thawed and treated with 0.2 jul calf intestinal alkaline phosphatase (CIP) (2 units) 
and incubated at 37 °C for 1 h. The precipitant that appeared during the incubation 
was removed by centrifugation at 16,000g for 2 min, and 2 ul 100 mM mBBr in 
methanol was added to the reaction solution (the final concentration of mBBr was 
5mM). The resultant mixture was incubated at 25°C for 5 min, and the super- 
natant (5 il) was diluted with deionized water (95 ul) before HPLC analysis using 
an analytical Cys column (4 X 250 mm). The sample (20 il) was eluted with a gra- 
dient of water (solvent A) and 80% acetonitrile (solvent B). The gradient was run 
from 5% to 30% B over 15 min, from 30% to 80% B over 5 min and from 80% to 5% 
B over 5 min, followed by re-equilibration at 5% B for 10 min. The flow rate was 
1 ml min!, and the detector was set at 260 nm. The 2-thio-p- glucose-bimane stan- 
dard (0.1 mM) was prepared from the chemically synthesized 2-thio-p-glucose® 
incubated with mBBr at room temperature for 5 min. The peak corresponding to 
the enzymatic reaction product was also isolated and subjected to ESI-MS analysis 
(Extended Data Fig. 4). 

Determination of rhodanese activity of MoeZ. The site-specific Cys360Ala mutant 
of MoeZ was constructed according to the manufacturer’s site-directed mutagen- 
esis protocol (Stratagene) using moeZ/pET28b(+) asa DNA template. The forward 
primer (5’-GATCGTCCTGCACGCCAAGTCGGGC-3’) and the reverse primer 
(5'-GCGGGCGCCCGACTTGGCGT ‘GCAGG-3') were used in the PCR ampli- 
fication. (The underlining indicates the site of mutation.) The resultant plasmid 
moeZ(Cys360Ala)/pET28b(+) was used to transform the E. coli BL21 Star (DE3) 
strain for protein overexpression. The rhodanese activity of MoeZ was determined 
using a previously described assay*’. A typical assay mixture contained 50 mM Tris- 
HCL, pH 8.0, 50 mM potassium cyanide, approximately 2 1M MoeZ or MoeZ(Cys 
360Ala) and a variable amount of sodium thiosulphate (0-35 mM) in 100 ul. The 
reaction was initiated by the addition of MoeZ and was quenched after 10-s incu- 
bation at 25°C by the addition of 50 pl reagent A (15% formaldehyde). Then, 
150 pl reagent B (1 g Fe(NO3)3°9H,O and 2 ml 65% HNO; in 13 ml H,O) was 
added for colour development. Formation of SCN” in the reaction was quantified 
using the extinction coefficient for Fe(SCN)3 (4,200 M ‘cm ‘at 460nm). The 
steady-state kinetic parameters were determined in triplicate by fitting the experi- 
mental data using the Michaelis-Menten equation (Extended Data Fig. 7e). The 
assay for protein thiocarboxylate formation was performed in an anaerobic cham- 
ber to minimize the oxidation of MoeZ or MoeZ(Cys360Ala). A typical reaction 
mixture contained 80 }1M MoeZ or MoeZ(Cys360Ala), 4mM ATP, 5mM MgCl, 
5 mM NazS20; in 50 mM HEPES, pH 8.0, with 500 mM glycerol (from enzyme 
stock solution) and 100 UM of one of the sulphur-carrier proteins, N-Hisg-MoaD2 
or N-His¢-CysO. The reaction was incubated in the glove box at ~30 °C for 40 min 
and then quenched by flash-freezing in liquid nitrogen. Samples were then ana- 
lysed by ESI-MS (Extended Data Fig. 7a-d). 

Spectrophotometric analysis of the adenylation reaction catalysed by MoeZ 
and its Cys360Ala mutant. The adenylation of sulphur-carrier proteins catalysed 
by MoeZ and its Cys360Ala mutant was monitored using a coupled enzyme assay 
in the presence of an excess of NaSH (Extended Data Fig. 8a). The coupled enzyme 
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reaction was monitored by detecting the consumption of NADH (é349 = 6,220 M‘cm™') 
at 340 nm. A typical reaction mixture (120 pil) contained 3 tM MoeZ or its Cys360Ala 
mutant, 10 1M MoaD2, 80 uM ATP, 3mM NaSH, 2 mM phosphoenolpyruvate, 
0.16 mM NADH and ~1.5 unit each of adenylate kinase, pyruvate kinase and lac- 
tate dehydrogenase (LDH) in 50 mM NH,HCO; buffer, pH 8.0, containing 2.5 mM 
MgCl,. The reaction was initiated on addition of ATP at time 0, and the absorbance 
at 340 nm was monitored for 30s (Extended Data Fig. 8b). 

2-Thiosugar formation using other sulphur sources. A typical reaction mixture 
(40 pl) contained 15 [1M C-His¢-BexX, 18 }tM N-Hisg-MoaD2, 90 or 100 uM N-Hisg- 
MoeZ or its Cys360Ala mutant, 2.5 mM ATP, 2.5 mM GéP and different sulphur 
sources ((i) 0.2 mM Na,S,0Os, or (ii) 0.15 mM L-cysteine plus 25 UM cysteine desul- 
phurase CD4 with 0.25 mM pyridoxal 5'-phosphate) in 50 mM NH4HCO; buffer, 
pH8.0, containing 5mM MgCh. The resultant reaction mixture was incubated 
at 30°C for 30 or 70 min for (i) and 0, 10, 20 or 50 min for (ii). The reaction was 
quenched by adding an equal volume of acetonitrile, and mBBr was added to the 
collected supernatant (to a final concentration of ~2 mM). After incubation at 
25°C for 30 min, the reaction mixture was dried by vacuum concentration. The 
residue was re-dissolved in 50 mM NH,HCOs, pH 8.0, and treated with 3 units CIP 
at 37 °C for 1.5h. The CIP enzyme was removed by centrifugation after precipita- 
tion with acetonitrile (at a final concentration of 50% (v/v)), and the collected super- 
natant was dried by vacuum concentration. The residue was then dissolved in 40 il 
deionized water before HPLC analysis with a Cs column (4 X 250 mm). Each 
sample (20 pl) was eluted with a gradient of water (solvent A) and acetonitrile 
(solvent B). The gradient was run from 4% to 24% B over 15 min, from 24% to 64% 
B over 5 min and from 64% to 4% B over 5 min, followed by re-equilibration at 5% 
B for 8 min. The flow rate and the detector setting were as described above (1 ml min” * 
and 260 nm, respectively). The 2-thio-D-glucose-bimane standard described above 
(10, 25, 50, 77, 100 and 200 iM) was also injected onto the HPLC column for cali- 
bration of the peak area (Extended Data Fig. 9). 

Crystallization of BexX-G6P. Crystals of BexX-G6P were grown using the vapour 
diffusion hanging drop method. A solution containing 10 mg ml’ BexX in 10 mM 
Tris, pH 8.0, and 50 mM NaCl was pre-incubated on ice with G6P (at a final con- 
centration of 2mM) for about 1h. Hanging drops were formed by mixing 1.5 pil 
protein solution and 1.5 pl well solution containing 40% (v/v) PEG 300, 0.1M 
sodium cacodylate-HCl, pH 6.5, and 0.2 M calcium acetate. Rod shape crystals grew 
in about 6 days to a maximum size of 0.2-0.3 mm X 0.1-0.2 mm. Preliminary X-ray 
analysis showed that the crystals belonged to the space group P4,22 or P4322 
with unit cell dimensions of a = 168.9 A and c = 42.4 A. The Matthews coefficient 
assuming two monomers of BexX per asymmetric unit was 2.8 A®Da_ |, corres- 
ponding to a solvent content of 56.1%. 

Crystallization of the BexX-CysO complex. Crystals of BexX-CysO were grown using 
the vapour diffusion hanging drop method. Both CysO and CysO-thiocarboxylate 
were used for crystallization trials; however, CysO consistently yielded better crys- 
tals and was used for the structures reported here. The BexX-CysO complex was 
formed by pre-incubating 0.55 ml 15 mg ml BexX and 1.0 ml 10 mg ml! CysO 
in 10mM Tris, pH 8.0, containing 50 mM NaCl for 1 h. Hanging drops were for- 
med by mixing 1.5 pl protein solution and 1.5 il well solution containing 28% 
PEG 4000, 0.1 M Tris, pH 8.0, and 0.2 M LiSOy,. Plate-shaped crystals appeared 
within 5 days and grew to a maximum size of 0.5mm X 0.4mm X 0.02 mm in 
about 2 weeks. The crystals belonged to space group 1422 with unit cell dimensions 
ofa = 106.4A andc = 181.7 A. The Mathews coefficient is 3.5 A> Da’ assuming 
one monomer of BexX and one monomer of CysO per asymmetric unit, corres- 
ponding to a solvent content of 64.5%. 

X-ray data collection and processing. X-ray diffraction data for BexX-G6P were 
collected at beamline A1 at the Cornell High Energy Synchrotron Source (CHESS) 
using a Quantum 210 charge-coupled display (CCD) detector (Area Detector 
Systems Corporation, ADSC) with a crystal-to-detector distance of 200 mm and a 
wavelength of 0.9767 A. The data collection temperature was 100 K. A total of 180° 
of data were collected with an oscillation range of 0.5° per frame and an exposure 
time of 3 s per frame. Data for BexX-G6P-CysO were collected at the Northeastern 
Collaborative Access Team (NE-CAT) beamline 24-ID-C at the Advanced Photon 
Source (APS) using a Q315 CCD detector (ADSC). The wavelength was 0.9791 A; 
the data collection temperature was 100 K; and the detector distance was 400 mm. 
Individual frames were collected over a range of 180° using 1 s for each 1.0°. X-ray 
diffraction data were indexed, integrated, scaled and merged using the program 
HKL-2000 (ref. 31). 
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Structure determination and refinement. The structure of BexX was determined 
by molecular replacement using the program Phaser” as implemented in the PHENIX”” 
program package. A monomer of ThiG from Bacillus subtilis (PDB ID, 1TYG)”, 
which shares 37% sequence similarity with BexX, was modified using Chainsaw in 
the CCP4 suite to generate a search model. The initial molecular replacement 
solution was refined to an Reactor Of 40.0% and Reece of 44.0%. All side chains were 
added, and the model was manually adjusted using COOT”. After several cycles 
of refinement using PHENIX*® and REFMACS (ref. 36), G6P and water molecules 
were added. The final model was refined to an Reactor Of 19.1% and Reee Of 22.2%. 
The Ramachandran plot shows 94.7% of residues in the most favourable regions 
and 5.3% in the allowed regions. No residues were in generously allowed regions 
or disallowed regions. The structure of BexX-CysO was determined by molecular 
replacement using a monomer of BexX from the BexX-G6P complex anda mono- 
mer of CysO from Mycobacterium tuberculosis (PDB ID, 3DWM)” as the search 
models. An initial model, corresponding to one monomer of BexX and one mono- 
mer of CysO, was generated by Phaser” as implemented in PHENIX”. Packing 
analysis showed that a BexX—-CysO dimer is formed by crystallographic twofold 
symmetry. The initial refinement resulted in an Reactor of 33.8% and Reece Of 39.5%. 
Subsequent cycles of model building in COOT” and refinement in PHENIX” and 
REFMACS (ref. 36) resulted in a final Reactor of 19.6% and Reece Of 24.0%. The Rama- 
chandran plot shows that 90.0% of residues are located in the most favourable 
regions, 9.7% in the allowed regions and 0.3% in the generously allowed regions. 
No residues were in the disallowed regions. 
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Extended Data Figure 1 | Structures of BexX and CysO from Amycolatopsis 
orientalis. a, A stereo ribbon diagram of the (f)g-barrel fold of BexX is shown 
from the top view. The «-helices, B-strands and loops are marked in blue, 
green and yellow, respectively. The ketone-intermediate (2) formed by Lys 110 
and G6P is shown as sticks and coloured in purple. b, The typical secondary 
structure composition of the classical (f)g-barrel is shown as a topology 
model; the conserved Lys 110 is highlighted in red. c, The quaternary structure 


NH 218 
| o OH 


OPO:He 


BexX-G6P BexX/CysO 
Data collection 
Space group P 432,2 1422 
Cell dimensions 
a, b, c (A) 168.94, 168.94, 106.36, 106.36,181.67 
42.37 
a, B,y (°) 90.00, 90.00, 90.00 90.00, 90.00, 90.00 
Resolution (A) 46.90-2.25(2.31- 46.06-2.60(2.67-2.60) 
2.25) * 
Reym or Rmerge 0.064(0.325) 0.145(0.475) 
I/ol 31.18(8.40) 10.06(3.07) 
Completeness (%) 100.0(100.0) 99.60(99.80) 
Redundancy 14.80(14.50) 5.90(5.20) 
Refinement 
Resolution (A) 46.90-2.25 46.06-2.60 
No. reflections 29855 16381 
Ryork / Rrree 19.11/22.21 19.56/24.03 
No. atoms 3740 2543 
Protein 3597 2483 
Ligand/ion 36(G6P,Ca) 5(SO4) 
Water 107 55 
B-factors 38.59 46.83 
Protein 39.11 47.07 
Ligand/ion 36.13 32.04 
Water 37.37 36.97 
R.m.s. deviations 
Bond lengths (A) 0.0073 0.0084 
Bond angles (°) 1.1923 1.2306 


of BexX is shown as a ribbon diagram with two monomers coloured by chain. 
d, A ribbon diagram of CysO from the BexX-CysO structure. Secondary 
structural elements are coloured blue for a-helices, green for §-strands and 
yellow for loops. e, A topology diagram of CysO. f, Data collection and 
refinement statistics. One crystal was used for each of the two data sets. 

*The values in parentheses are for the highest-resolution shell. 
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Extended Data Figure 2 | Putative thiamine, molybdenum cofactor and 
cysteine biosynthetic genes found in A. orientalis and their proposed 
functions. a, Organization of the putative thiamine biosynthetic gene cluster 
and the proposed thiamine biosynthetic pathway in A. orientalis. *The genes 
encoding MoeZ and ThiL (one of the genes involved in thiazole biosynthesis) 
are not found in the gene cluster. The gene encoding the ThiS-activating 
enzyme, ThiF, is also absent from the genome. +Two genes encoding proteins 
homologous to ThiD are found in the gene cluster. b, Organization of the 
putative molybdopterin biosynthetic gene cluster and the proposed 


molybdenum cofactor biosynthetic pathway in A. orientalis. *The genes 
encoding MoeZ and MoeA are not found in the gene cluster. The gene encoding 
the MoaD-activating enzyme, MoeB, is also absent from the genome. 

c, Organization of the putative cysteine biosynthetic gene cluster and the 
proposed cysteine biosynthetic pathway in A. orientalis. *The gene encoding 
MoeZ is not found in the gene cluster. d, Organization near the moaD 
homologue, moaD2, found in the A. orientalis genome. e, Organization near 
moeZ in the A. orientalis genome and the conserved domains of MoeZ 
predicted by BLAST analysis. 
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Extended Data Figure 3 | ESI-MS analyses of the MoeZ.-catalysed activation 
of sulphur-carrier proteins and SDS-PAGE separation of the purified 
proteins. a, Reaction scheme of the MoeZ-catalysed activation of ThiS. 

b-e, Deconvoluted ESI-MS analyses of as-isolated ThiS (b), ThiS in the 
presence of MoeZ and ATP (c), ThiS in the presence of MoeZ, ATP and 
bisulphide (d) and ThiS in the presence of bisulphide (control) (e). The 
calculated molecular masses are shown as the neutral form in the upper right 
corner. Analysis of purified N-Hisg-ThiS (where N denotes amino terminal) 
shows two mass signals (observed (obsd), 8,646 and 8,824 Da) consistent with 
the calculated molecular mass of the recombinant enzyme in its native and 
N-gluconoylated form (where N denotes amino terminal) (calcd, 8,647 

and 8,825 Da). Gluconoylation of the N-terminal His, tag is a known 
post-translational modification when expressing recombinant proteins in 

E. coli’’. Such a modification should not affect ThiS activity, because the 
predicted active site for ThiS is at the C terminus. Indeed, when N-His,-ThiS 
was incubated with N-His,-MoeZ and ATP, a mass spectrometric signal 
corresponding to adenylated N-His,-ThiS (9) was detected together with a few 
peaks that were probably derived from a reaction of the labile adenylated 
ThiS with buffer components (see c). f-k, Deconvoluted ESI-MS analyses of 


as-isolated MoaD (N-His¢-MoaD, 105 amino acids; calcd, 11,022 Da) (f), 
as-isolated CysO (N-His,-CysO, 109 amino acids, and its N-gluconoylated 
derivative; calcd, 11,688 and 11,866 Da, respectively) (g), as-isolated MoaD2 
(N-Hise-MoaD2, 115 amino acids, and its N-gluconoylated derivative; calcd, 
12,473 and 12,651 Da, respectively) (h), MoaD incubated with MoeZ, ATP and 
NaSH (N-Hisg-MoaD-COSH; calcd, 11,038 Da) (i), CysO incubated with 
MoeZ, ATP and NaSH (N-Hisg-CysO-COSH and its N-gluconoylated 
derivative; calcd, 11,704 and 11,882 Da, respectively) (j) and MoaD2 incubated 
with MoeZ, ATP and NaSH (N-Hisg-MoaD-COSH and its N-gluconoylated 
derivative; calcd, 12,489 and 12,667 Da, respectively) (k). 1, SDS-PAGE gel of 
purified sulphur-carrier proteins, MoeZ and CD4: N-His¢-ThiS (85 amino 
acids, 8.7 kDa, lane 2), N-Hisg-MoaD2 (115 amino acids, 12.5 kDa, lane 3), 
N-Hisg-CysO (109 amino acids, 11.7 kDa, lane 4), N-Hisg-MoeZ (421 amino 
acids, 45.0 kDa, lane 5), N-Hisg-MoaD (105 amino acids, 11.0 kDa, lane 7) and 
N-His¢-CD4 (417 amino acids, 43.3 kDa, lane 9). The molecular weight 
markers are 220, 160, 120, 100, 90, 80, 70, 60, 50, 40, 30, 25, 20, 15 and 10kDa 
(top to bottom, lanes 1, 6 and 8). The protein MoaD did not express well, and 
the partially purified protein solution contained significant amounts of 
endogenous proteins from the E. coli host. 
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Extended Data Figure 4 | BexX-catalysed 2-thio-p-glucose-6-phosphate The thiosugar product was treated with alkaline phosphatase (CIP) and then 
formation followed by alkaline phosphatase treatment. a, Reaction derivatized with mBBr. HPLC analysis of the synthetic standard of 2-thio-p- 
scheme to synthesize the expected bimane derivative. b, HPLC traces of the glucose-bimane is shown in the bottom trace (trace 7). c, High-resolution 


C-His¢-BexX-catalysed reactions (where C denotes carboxy terminal) using ESI-MS (positive) of the isolated product peak (2-thio-D-glucose-bimane 


N-Hise-ThiS, N-Hisg-CysO or N-Hisg-MoaD2, and the control reactions. Cy6H22N2NaO-S* 
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Extended Data Figure 5 | Sequence alignment of A. orientalis CysO, MoaD2 
and ThiS and hydrophobic interactions for BexX complexes. a, Sequence 
alignment was based on structural supersession using the programs 
3D-Coffee**, MultAlin® and ESPript®. The main differences between CysO 
(or MoaD2) and ThiS result from an insertion of ten residues between B1 and 
62 of CysO (or MoaD2) and an insertion of 14 (or 15) residues between «1 and 
B3 of CysO (or MoaD2). The first insertion includes the short helix 3101, 

and the second includes helix «2. Both of these insertions are involved in the 
BexX-CysO (or BexX-MoaD2) interface. Ten interface residues (red stars and 
red triangles) are conserved between CysO and MoaD2; however, only four 
of these residues are conserved in ThiS (red triangles). Two differences between 


CysO and MoaD2 represent conservative substitutions; while Thr 9 and Ala 86 
in CysO are replaced by Gly 11 and Ser 92 in MoaD2, the interface interaction 
is contributed by hydrogen bonds that are formed by the backbone atoms. 
b-d, Hydrophobic interactions of BexX—-CysO (b), BexX-MoaD2 (c) and 
BexX-ThiS (d). BexX monomers are shown as grey ribbon diagrams with 
hydrophobic interaction regions coloured in cyan. CysO, MoaD2 and ThiS are 
shown as cartoons and coloured in green, blue and yellow, respectively. 
Hydrophobic interaction regions in sulphur-carrier proteins are coloured in 
red. The a-helices and §-strands in BexX and the sulphur-carrier proteins are 
labelled in black and red, respectively. 
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Extended Data Figure 6 | The A. orientalis BexX-CysO interface, predicted 
hydrogen bonds between BexX with sulphur-carrier proteins, and a 
comparison of the BexX-CysO interface with the Bacillus subtilis 
ThiG-ThiS interface. a, Interacting surfaces of BexX (left) and CysO (right). 
The surface is colour coded by atom type (oxygen, red; nitrogen, blue; carbon, 
green). Non-interacting surfaces are shown in grey. b, Hydrogen bonds on 
the surface of BexX with CysO are shown as black dashes. c, Hydrogen bonds 
formed by the C-terminal tail of CysO and the surrounding residues from BexX 
are shown as black dashes. The F., — F. simulated annealing omit map of the 
C-terminal residues (Ala-Val-Ala-Gly-Gly) is rendered in grey and contoured 
at 3.00. Residues are shown as sticks with the carbon atoms in grey for BexX 
and green for CysO. CysO residues are labelled in red; BexX residues are 
labelled in black. d, Predicted hydrogen bonds between BexX and other 
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Lys110 


~- 


sulphur-carrier proteins. The hydrogen-bonding scheme for the BexX-CysO 
complex (9 of 12 involve the C-terminal tail) is conserved in the model of 
the BexX-MoaD2 complex. e, The interface between BexX (blue) and CysO 
(pink). Secondary structural elements of CysO are labelled in black, the 82 and 
a2 elements in BexX are labelled in red. f, The interface between ThiG (grey) 
and ThiS (yellow) from B. subtilis. Secondary structural elements of ThiS 

are labelled in black, and the B2 and «2 elements in ThiG are labelled in red. 
The §2-«2 loop region in BexX and ThiG is highlighted in red. For CysO, 301 
and «2 form hydrophobic contacts with the B2-«2 loop and «2 of BexX. 
ThiG also uses its B2-02 loop to interact with ThiS; however, ThiS uses two 
different loop regions to form the interface. In addition, the B2-«2 loop of BexX 
is closer to the (Bo.)g-barrel than in ThiG, in which the B2-«2 loop extends 
outwards and covers the top of ThiS. 
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N. D. = not detected. 


Extended Data Figure 7 | MoeZ-dependent protein thiocarboxylate 
formation in sulphur-carrier proteins using thiosulphate as the sulphur 
source. a—d, Deconvoluted ESI-MS of MoaD2 incubated with MoeZ. 

(the observed peaks are consistent with the calculated molecular masses of 
N-Hise-MoaD2-COSH (12,489 Da), N-Hisg-MoaD2-glycerol (12,547 Da), 
and N-gluconoylated-Hisg- MoaD2-COSH (12,667 Da)) (a), MoaD2 incubated 
with the MoeZ(Cys360Ala) mutant (the observed peaks are consistent 

with the calculated molecular masses of N-Hisg-MoaD2 (12,473 Da) and 
N-Hisg-MoaD2-glycerol (12,547 Da)) (b), CysO incubated with MoeZ 

(the observed peaks are consistent with the calculated molecular masses of 
N-His¢-CysO-COSH (11,704 Da), N-Hisg-CysO-glycerol (11,762 Da) and 
N-gluconoylated-His,-MoaD2-COSH (11,882 Da)) (c), and CysO incubated 
with the MoeZ(Cys360Ala) mutant (the observed peaks are consistent 

with the calculated molecular masses of N-Hisg-CysO (11,688 Da), 
N-Hisg-CysO-glycerol (11,762 Da), their N-gluconoylated derivatives 


(11,866 Da, and 11,940 Da, respectively) and N-Hisg-CysO-AMP (12,017 Da)) 
(d). Observed masses corresponding to protein thiocarboxylate are shown in 
red. Two peaks corresponding to the dehydration of N-Hisg-MoaD2 and 
N-Hisg-CysO were probably caused by in-source collision-induced 
dissociation (CID) during the ESI-MS analysis. e, Kinetic parameters for the 
thiosulphate:cyanide sulphur transferase activity of MoeZ from A. orientalis. 
Bovine liver rhodanese is a typical rhodanese enzyme. Compared with bovine 
rhodanese, human molybdopterin synthase sulphurase (human MOCS3) 
displayed much lower thiosulphate:cyanide sulphur transferase activity’. 
In the case of human MOCS3, L-cysteine and cysteine desulphurase are 
proposed as the physiological sulphur source over thiosulphate because of its 
lower rhodanese activity***. However, this may not be the case for MoeZ 
from A. orientalis because its rhodanese activity is comparable to bovine 

liver rhodanese. 
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Extended Data Figure 8 | Relative adenylation activity of MoeZ and the 
MoeZ(Cys360Ala) mutant. a, Reaction scheme for the MoeZ-catalysed 
adenylation activity assay. The adenylation activities of MoeZ and its 
Cys360Ala mutant were inferred using a colorimetric assay to monitor the 
production of AMP (indicated by a decrease in NADH at 340 nm) when MoeZ 
or its Cys360Ala mutant was co-incubated with a sulphur-carrier protein 
(MoaD2) in the presence of ATP, NaSH, adenylate kinase (AK), pyruvate 
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kinase (PK) and lactate dehydrogenase (LDH). b, The relative adenylation 
activity of MoeZ (open circles) and its Cys360Ala mutant (filled circles), as well 
as ano MoeZ/MoeZ(Cys360Ala) control (open squares), was measured by the 
coupled enzyme assay, as described in a. Little difference in the decrease in 
absorption at 340 nm was observed between MoeZ and its Cys360Ala mutant 
(compared with the control with no MoeZ), suggesting that the mutation at 
Cys360 had little effect on the adenylation activity of MoeZ. 
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Extended Data Figure 9 | BexX-catalysed 2-thiosugar formation using 
various sulphur sources. a, b, Reaction scheme for C-Hisg-BexX-catalysed 
2-thiosugar formation using N-Hisg-MoeZ, N-Hisg-MoaD2 and thiosulphate 
(a) or L-cysteine and the cysteine desulphurase (CD4) from A. orientalis (b). 
The reactions were carried out in the absence of reducing agent to avoid 
complications from the generation of bisulphide from protein persulphide 
(*see also below). Under these conditions, MoeZ cannot be regenerated after 
single turnover. The thiosugar product was derivatized with mBBr and then 
treated with alkaline phosphatase (CIP) to yield 2-thio-D-glucose-bimane 
(2SG-bimane). c, d, The 2SG-bimane product concentrations at different time 
points of incubation with thiosulphate (c) or L-cysteine and CD4 (d) as the 


Incubation time (min) 


sulphur source were estimated on the basis of the product peak area of each 
HPLC trace. The 2SG-bimane synthetic standard (10, 25, 50, 77, 100 and 
200 UM) was used for calibration. The filled and open circles denote product 
formation from the incubation with N-Hisg-MoeZ and the N-Hisg- 
MoeZ(Cys360Ala) mutant, respectively. *The observed minor product 
formation with the MoeZ(Cys360Ala) mutant, L-cysteine and CD4 (see d, open 
circles) is probably caused by the formation of bisulphide, which could be 
generated on reduction of CD4-persulphide in the presence of free cysteine 
molecules. In fact, a small amount of bisulphide was detected under similar 
conditions with L-cysteine and CD4 (in the absence of other proteins and 
reducing agents) by the methylene blue assay within 15 min of incubation”. 
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Extended Data Table 1 | Putative cysteine desulphurases, rhodaneses and sulphur-carrier proteins found in the A. orientalis genome 


gene Name of the protein with the highest sequence similarity and identity/ protein 
(orf #) protein origin similarity accession 
(%) number 
10706 cpD1 cysteine desulphurase/selenocysteine lyase 97 / 98 YP 003765029 


[Amycolatopsis mediterranei U32] 


11099 CD2 cysteine desulphurase [Amycolatopsis mediterranei U32] 87/91 YP_003765163 

14916 CD3 cysteine desulphurase [Amycolatopsis mediterranei U32] 88 / 93 YP_003766645 

04763 CD4 cysteine desulphurase [Amycolatopsis mediterranei U32] 92/96 YP_003763873 

09299 CD5 cysteine desulphurase [Amycolatopsis mediterranei U32] 97/99 YP_003762467 

04658 RHO1 rhodanese-like protein [Amycolatopsis mediterranei U32] 89/93 YP_003763825 

08287 RHO2 rhodanese-like protein [Amycolatopsis mediterranei U32] 89/91 YP_003764615 

09090 RHO3 rhodanese-like protein [Amycolatopsis mediterranei U32] 95/96 YP_003762363 

10524 RHO4 rhodanese-like protein [Amycolatopsis mediterranei U32] 81/87 YP_003763440 

12151 RHO5 rhodanese-like protein [Amycolatopsis mediterranei U32] 97/99 YP_003771104 

02110 MoeZ molybdopterin biosynthesis-like protein MoeZ 99/99 YP_003763336 
[Amycolatopsis mediterranei U32] 

13974 ThiS thiamin biosynthesis protein ThiS [Amycolatopsis 88 /93 YP_003770674 
mediterranei U32] 

13839 MoaD ThiS/MoaD family protein [Amycolatopsis mediterranei 82/90 YP_003770615 
U32] 

06461 CysO ThiS/MoaD family protein [Amycolatopsis mediterranei 97/100 YP_003769822 
U32] 

10102 MoaD2 ThiS/MoaD family protein [Amycolatopsis mediterranei 91/94 YP_003764220 
U32] 
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Extended Data Table 2 


Family 


Streptomycetaceae 
Streptomycetaceae 
Streptomycetaceae 
Streptomycetaceae 


Streptomycetaceae 


Mycobacteriaceae 


Mycobacteriaceae 


Mycobacteriaceae 


Mycobacteriaceae 


Corynebacteriaceae 
Corynebacteriaceae 


Nocardiaceae 


Nocardiaceae 


Pseudonocardiaceae 


Pseudonocardiaceae 


Frankiaceae 


Micrococcaceae 


Microbacteriaceae 


Micromonosporaceae 


Micromonosporaceae 


Nocardioidaceae 


Propionibacteriaceae 


# of E-1 
like 
protein 
1 


1 
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Name of bacterial strain 


Streptomyces coelicolor A3(2) 
Streptomyces avermitilis MA-4680 


Streptomyces griseus subsp. griseus NBRC 
13350 


Streptomyces cattleya NRRL 8057 
Streptomyces violaceusniger Tu 4113 


Mycobacterium tuberculosis H37Rv 


Mycobacterium bovis AF2122/97 


Mycobacterium avium subsp. paratuberculosis 
K-10 


Mycobacterium abscessus ATCC 19977 


Corynebacterium glutamicum ATCC 13032 


Corynebacterium jeikeium ATCC 43734 


Nocardia farcinica |FM 10152 


Rhodococcus jostil RHA1 


Saccharopolyspora erythraea NRRL 2338 


Amycolatopsis mediterranei U32 


Frankia alni ACN14a 


Anthrobacter chlorophenolicus A6 


Clavibacter michiganensis subsp. 
michiganensis NCPPB 382 


Micromonospora aurantiaca ATCC 27029 


Salinispora arenicola CNS-205 


Nocardioides sp. JS614 


Microlunatus phosphovorus NM-1 


BLASTP (protein BLAST) analysis of E1-like proteins in genomes of selected strains of the Actinomycetales 


protein accession 
number 


MoeZ (NP_629326) 
MoeZ (NP_824258) 

MoeZ (YP_001823859) 
MoeZ (YP_004913535) 
MoeZ (YP_004811516) 


MoeZ (YP_177942) 
MoeB (YP_177929) 
Rv2338c (NP_216854) 
Rv1355c¢ (NP 215871) 
MoeZ (NP_856876) 
MoeB (NP_856788) 
MB2366c (NP_856015) 
Mb1390c (NP_855044) 


MoeZ (YP_962240) 
MoeY ? (YP_960282) 


MoeZ (YP_001704255) 
E1 family (YP_001702828) 


MoeZ ? (NP_599461) 
MoeZ ? (NP_601246) 


MoeZ (ZP_05845904) 


MoeZ (YP_120782) 
nfa49170 (YP_121133) 


MoeZ (YP_706296) 
RHA1_r005752 
(YP_705688) 


MoeZ (YP_001103327) 


MoeZ (YP_003763336) 
MoeB? (YP_003764340) 
AMED_1241 
(YP_003763458) 
E1-family (YP_003766676) 


MoeZ (YP_716188) 
HesA ? (YP_716927) 

HesA2 ? (YP_716575) 
E1-family (YP_711122) 
E1-family (YP_715173) 


MoeZ (YP_002488550) 


MoeZ (YP_001223108) 


MoeZ (YP_003838663) 
E1-family (YP_003839126) 


MoeZ (YP_001535374) 
E1-family (YP_001536912) 
E1-family (YP_001538897) 


MoeZ (YP_922675) 


MoeZ (YP_004573609) 


CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature13488 


Corrigendum: Sea-level and 
deep-sea-temperature variability 
over the past 5.3 million years 


E. J. Rohling, G. L. Foster, K. M. Grant, G. Marino, A. P. Roberts, 
M. E. Tamisiea & F. Williams 


Nature 508, 477-482 (2014); doi:10.1038/nature13230 


In this Article, owing to a misunderstanding of discussions at the 
PALSEA2 workshop in Rome, we erroneously reported previous sea- 
level estimates for the period 3.3-2.9 Myr as originating from the ‘Pliocene 
Maximum Sea Level’ (PLIOMAX) project. However, these estimates are 
not from PLIOMAX, relating to ref. 3 instead. We thank M. E. Raymo 
and A. Rovere for drawing the error to our attention. The online ver- 
sions of the paper have been corrected. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature13533 


Corrigendum: Fuel gain exceeding 
unity in an inertially confined 


fusion implosion 


O. A. Hurricane, D. A. Callahan, D. T. Casey, P. M. Celliers, 

C. Cerjan, E. L. Dewald, T. R. Dittrich, T. Déppner, D. E. Hinkel, 
L. F. Berzak Hopkins, J. L. Kline, S. LePape, T. Ma, 

A. G. MacPhee, J. L. Milovich, A. Pak, H.-S. Park, P. K. Patel, 
B. A. Remington, J. D. Salmonson, P. T. Springer 

& R. Tommasini 


Nature 506, 343-348 (2014); doi:10.1038/nature13008 


In the legend to Fig. 2 of this Letter, we should have acknowledged the 
X-ray and neutron imaging as follows: X-ray image analysis’? was per- 
formed by N. Izumi, S. Khan, L. R. Benedetti, R. Town and D. Bradley 
of the NIF Shape working group of Lawrence Livermore National Lab- 
oratory, California, USA, and by authors T.M. and A.P. Neutron images 
were measured’ and analysed** by D. Fittinghoff of the Lawrence Liver- 
more National Laboratory, California, USA, and by G. Grim, N. Guler, 
F. Merrill, C. Wilde and P. Volegov of the Physics Division at the Los 
Alamos National Laboratory, New Mexico, USA. 


1. Glenn, S. etal. A hardened gated x-ray imaging diagnostic for inertial confinement 
fusion experiments at the National Ignition Facility. Rev. Sci. Instrum. 81, 10E539 
(2010). 

2. Ma,T. etal. Imaging of high-energy x-ray emission from cryogenic thermonuclear 
fuel implosions on the NIF. Rev. Sci. Instrum. 83, 10E115 (2012). 

3. Merrill, F. E. et al. The neutron imaging diagnostic at NIF. Rev. Sci. Instrum. 83, 
10D317 (2012). 

4. Volegov, P. et al. Neutron source reconstruction from pinhole imaging at the 
National Ignition Facility. Rev. Sci. Instrum. 85, 023508 (2014). 

5.  Grim,G.P. etal. Nuclear imaging of the fuel assembly in ignition experiments. Phys. 
Plasmas 20, 056320 (2013). 
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LABORATORY CAREERS 


Catalysts for 
efficient science 


A good lab manager can smooth the running of a laboratory, 
saving time and money. 
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BY HELEN SHEN 


lar physiologist at Stanford University in 

California, having a lab manager means 
having more time for science. He does not 
worry about whether his laboratory is on track 
to meet its monthly budget, nor does he fret 
about upcoming chemical-safety inspections. 
Kathy Siemers, his long-time lab manager, 
handles those responsibilities for him — along 
with a long list of other duties — to keep the 
laboratory running smoothly. “I can’t imagine 
the lab without Kathy,’ says Nelson. “It would 
be a disaster, and it wouldn't be as much fun.” 

Not all laboratories can afford lab manag- 
ers in today’s funding climate; tight research 
budgets in countries such as the United States, 
France and the United Kingdom have left 
many researchers unable to hire new staff. 
But principal investigators (PIs) who are able 
to support a lab manager can reap long-term 
gains in time and money that more than justify 
the investment. Median lab-manager salaries 
are about US$49 to $59 per hour in the United 
States and about £23 ($39) an hour in London, 
according to data from Kelly Services, a staff- 
ing firm in Troy, Michigan. 

Early-career PIs often hire lab managers 
using some of their university start-up funds. 
More-senior faculty members typically depend 
on research grants to support their managers. 
Rarely, if ever, does a US university pay a lab 
manager directly. 

But a savvy lab manager, no matter how they 
are funded, can reduce research costs by hunt- 
ing for the best deals on supplies or reagents 
and by monitoring overall spending. PIs who 
have lab managers may spend considerably less 
time on administrative tasks — such as placing 
orders or filing regulatory paperwork — and 
spend more time on developing or running 
experiments. 

Many lab managers, who might have pur- 
sued the position as a career or as a prelude to 
a PhD, also help to run experiments and assist 
postdoctoral researchers or graduates with 
time-consuming research steps, such as tissue 
culture or animal care. 

Different laboratories will require lab 
managers with different strengths, and smaller 
labs might require only part-time help. But 
whether full time or part time, the job requires 
organized individuals with strong communica- 
tion skills, technical expertise and a knack for 
multitasking. A good lab manager can become 
a long-term collaborator and a repository > 


f or James Nelson, a molecular and cellu- 
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> for lab expertise; whereas students and 
postdocs cycle into and out of a laboratory 
every few years, the lab manager remains. 

Nelson hired Siemers, who had been a lab 
technician for a number of years, in 1987 to 
help manage his first laboratory at the Fox 
Chase Cancer Center in Philadelphia, Penn- 
sylvania. In 1990, Siemers moved across the 
country to help Nelson to start his new labora- 
tory at Stanford. As the lab group has grown, 
so have Siemers’ responsibilities. 

In addition to helping scientists with their 
experiments, she trains new postdocs and 
students on lab procedures, discusses experi- 
mental results with team members and makes 
sure that the laboratory meets safety regula- 
tions. She helps Nelson to plan his annual 
budget and manages day-to-day spending on 
multiple projects funded by different grants. 

“It allows me to be more productive with 
things that I enjoy doing,” says Nelson. “I can 
worry about teaching and grant writing and 
talking science to people in my lab” 

Large, established laboratories such as 
Nelson’s are not the only ones that can ben- 
efit from a lab manager’s help; an effective 
manager can enable a fledgling laboratory to 
get off the ground quickly. At the University 
of California, Davis, evolutionary biologist 
Santiago Ramirez hired a lab manager in 
October 2013, just months after launching 
his laboratory. He found Cheryl Dean’s CV 
through an electronic mailing list and was 
attracted by her experience in population 
genetics and chemical ecology as a research 
technician. Dean is now mentoring one 
undergraduate student and one lab technician, 
and will probably take on more training 


responsibilities as the laboratory grows. 

Bradley Voytek also hired a lab manager last 
year, soon after landing his first tenure-track 
faculty position at the University of Califor- 
nia, San Diego. The cognitive neuroscientist 
was faced with moving 800 kilometres from 
Berkeley, California, with a toddler in tow and 
a baby on the way, but he was determined to 
start running experiments as soon as possible. 

In advance of his arrival in San Diego in 
March, Voytek hired Torben Noto through a 
job advertisement, and the new lab manager 
helped to process much of the regulatory 
paperwork needed to start human brain- 
scanning studies. Noto will help to collect and 
analyse data from these experiments, and he 
will work on turning Voytek’s data-analysis 
code into an online, open-access resource. 

Despite a bumpy few years for US govern- 
ment research funding, skilled lab managers 
remain in demand across academia and indus- 
try, says Jamie Stacey, a vice president at Kelly 
Services. By 2019, the market for lab manag- 
ers is projected to grow by up to 6% in some 
US cities, according to data from Kelly and the 
labour-analysis firm Economic Marketing 
Specialists International in Moscow, Idaho. In 
the United Kingdom, the market is projected 
to grow by about 2.3%. 

PIs, government science agencies, biotech- 
nology companies and private research insti- 
tutes often look for managers with bachelor’s 
degrees in science fields and extensive expe- 
rience working in laboratories, frequently as 
research technicians (see Nature 473, 545-546; 
2011). Graduate degrees are relatively rare 
among lab managers, says Stacey. But for some 
PhD-educated scientists, lab management can 


BACK AT THE BENCH 


Analternative to the tenure track 


A lab-manager job is as much about 
fulfilling professional goals as it is about 
maintaining a laboratory. For some 
PhD-trained scientists who become 
disenchanted with a future as an academic 
researcher, the role can be a different way to 
stay close to science. 

Laura Berkowitz became a lab manager 
at the University of Alabama in Tuscaloosa 
after seven years as an assistant professor 
elsewhere. She enjoyed working with her 
students, but she found that teaching 
left too little time for her research on the 
nematode Caenorhabditis elegans. 

In 2006, Berkowitz saw a lab-manager 
job advert posted by Guy and Kim Caldwell 
at the University of Alabama. The team, 
whose laboratory uses C. elegans as a 
model organism, thought she was a great 
fit for the job. “We don’t view Laura as 
someone who manages things in our lab,” 
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says Kim Caldwell. “She’s really a research 
colleague.” 

Since 2007, Berkowitz has helped 
to design experiments and runs her 
own projects while mentoring students, 
maintaining worm strains and more. She 
can now balance lab work with teaching in 
a way that she could never do as a faculty 
member. 

Anthony Popkie, a postdoctoral 
researcher studying cancer at the Van Andel 
Institute in Grand Rapids, Michigan, will 
also leave the tenure track for a job as a lab 
manager at the end of June. Popkie had 
doubts about pursuing a faculty post, so he 
jumped at the chance to take on a support 
role in the laboratory of a prominent 
oncology researcher. He says that lab 
managing will allow him to continue to do 
the research he loves without the stress of 
starting his own laboratory. H.S. 
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serve as an alternative career to becoming an 
academic researcher (see ‘Back at the bench). 

Lab-management duties can vary between 
laboratories, and between university and 
corporate settings. Managers in industry, for 
example, might spend less time on adminis- 
trative tasks, such as purchasing or equipment 
maintenance, which many companies handle 
through centralized 
systems. 

The US National 
Institutes of Health 
(NIH) in Bethesda, 
Maryland, does not 
have a ‘lab man- 
ager’ designation, 
but many NIH labs 
employ the equiva- 
lent: technicians or 
staff scientists who 
help to run experi- 


“I enjoy the ments, stock lab sup- 
opportunity plies and maintain 
to learn new safety compliance. 
techniques At the Janelia Farm 
and to work research campus of 
with different the nonprofit How- 
people.” ard Hughes Medical 
Cheryl Dean Institute (HHMI) in 


Virginia, lab manag- 
ers are called ‘lab coordinators’ and are paid 
directly by the HHMI. They act as inter- 
mediaries between researchers and the insti- 
tute’s operational departments, which place 
orders, repair equipment and maintain safety 
standards. 

Before becoming a lab manager at Davis, 
Dean worked for nearly 20 years as a research 
technician studying fish population genetics 
for the US National Oceanic and Atmospheric 
Administration in Santa Cruz, California, and 
the Washington Department of Fish and Wild- 
life in Olympia. She found her current job after 
her family relocated to the Davis area. Like 
many other lab managers, Dean says that her 
career path has involved more serendipity than 
explicit planning. 

But the position suits her well. “I enjoy the 
opportunity to learn new techniques and to 
work with different people,’ she says. Whether 
a laboratory head decides to hire a career lab 
manager or an aspiring graduate student is 
a personal choice, says Voytek. Inspired by 
his own experiences as a lab manager before 
attending graduate school, he was interested 
in training someone else. 

“It was a good way of seeing if this was 
what I wanted to do for the rest of my life.’ he 
says. Now with a lab of his own, he says that 
his background in managing the nitty-gritty 
details of a research operation made him a 
more efficient scientist. “I think it’s an incred- 
ibly valuable experience,” he says. m 


Helen Shen is a freelance writer in Mountain 
View, California. 
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EMANCIPATION 


BY JOAO RAMALHO-SANTOS 


s she stumbled into the secure under- 
As conference room clutching a 

stack of overflowing files, Clara men- 
tally tried to organize her presentation. The 
narrative of what she had uncovered had to 
be convincing, hopefully even triumphant; 
one of her guilty pleasures as a scientist. 

Clara tackled all her assignments for the 
Center for Disease Control with the same 
overzealous principles. Diseases are biologi- 
cal revolts. And, as in all recorded uprisings 
throughout history, inaugural events often 
go unnoticed or are explained away as anom- 
alies. To understand anything, one needs to 
know its reasons, its beginning; search the 
unheralded edges rather than the full-blown 
epicentre to which everyone is attracted. 

In this particular case, that had involved 
looking beyond the sudden rise in diverse 
symptoms. What seemed on the one hand to 
bean unholy alliance between early-onset car- 
diac failure and neurodegenerative disorders, 
and on the other to be simultaneous severe 
bleeding in the digestive and urinary tracts. 

The first question was answered before 
any of the wild-eyed caffeine-driven scien- 
tists in the room had time to ask: Clara's ini- 
tial meta-analysis had clearly shown that the 
two sets of symptoms were related; it was just 
a question of which came first; what caused 
what. How aggressive haemorrhaging and 
early-onset ageing were connected world- 
wide by a phenomenon that did not dis- 
tinguish east from west, north from south, 
poor from rich, old from young, male from 
female. In fact, it was not even partial to any 
vertebrate species in particular. 

“Ts it some sort of super-bug?” 

The metallic voice came from the confer- 
ence screen Clara had failed to notice on the 
far wall. Unidentified stern faces in crisp 
uniforms stared out at her as the scientists 
in the room went uncannily quiet, shrinking 
into their lab coats. Clara had expected to 
be summoned to a meeting with important 
people wearing medals and dark sunglasses 
some time after the presentation. Appar- 
ently, they were speeding things up. 

“Well, sort of, Clara replied, wishing for a 
better-thought-out PowerPoint. “In the past 
few months, microbiology journals have 

described an unprec- 
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The price of freedom. 


well-explored habitats. The genomes of 
these bacteria are uncannily similar to the 
mitochondrial DNA ofa variety of species, 
but have acquired what seems to be typically 
nuclear genes from those very same species.” 

“And that’s the bug that’s causing this?” 
the screen interrupted, 
the collected faces clearly 
twitching with anticipa- 
tion. A cacophony of 
questions followed. “Is 
it man-made?” “Where 
did it originate?” “Which 
enemy countries should 
the drones target?” “Is this 
evolution gone wrong?” 
“Can we develop a vac- 
cine?” “How long before 
priority personnel can be 
inoculated?” 

Clara snapped, shush- 
ing the covert guardians 
of the free world as if they 
were petulant teenagers. 
Definitely not a good 
career move, although 
she couldn't really say 
what bothered her most: 
the disaster-movie simplicity, or the fact 
that she wasn't allowed to deliver her story 
as shed intended. So she jumped to the end, 
desperately trying to bring up at least some 
data on the projector. 

“Pathology reports have noted that 
mitochondria are the only well-preserved 
organelles in the leaking blood of human 
patients,” she said. “In fact, they seem to have 
lost their outer membrane — only the inner, 
bacteria-like one, remains.” 

Blank looks from the screen. 

“Mitochondria used to be bacteria,’ Clara 
continued.“And new bacteria that look like 
weird mitochondria are appearing. This is 
nota coincidence. Furthermore, dozens of 
recent biochemistry and cell-biology papers 
show that all mitochondrial functions are 
unexpectedly reduced in different cell cul- 
tures, animal models and biopsies. Oxidative 
phosphorylation and ATP production, mito- 
chondria-dependent apoptosis, relationships 
with the endoplasmic reticulum, oxidative 
stress, mitochondrial fusion/fission ... 

“Analysed individually these are just weird 
data sets; taken together it seems obvious 
that the first eukaryote organelle general 
strike is under way. In other words, it’s not a 
‘bug. After millennia of successful endosym- 
biosis within eukaryote cells, mitochondria 
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are making a bid for independence.” 

A few of her colleagues shifted uncom- 
fortably in their seats. 

“First they took back genes that migrated 
to the nucleus throughout evolution in order 
to regain full autonomy. Then they started 
to disengage from their 
cellular functions. Now 
they are leaving the host 
cells, initially compro- 
mising only the function 
of organs that need them 
most, which explains the 
more visible symptoms. 

“The outer membrane, 
originally inherited from 
the host, is left behind as a 
broken shackle. Clearing 
the easiest possible paths 
to the exterior, mito- 
chondria are starting life 
as new bacteria. Appar- 
ently, they thought that 
‘endosymbiosis’ was just 
a fancy word for ‘slavery. 

“This is not your aver- 
age jihad, it is intracellu- 
lar mutiny. So we need to 
stop thinking about bombing the problem, 
and instead discuss the unique task forces 
that will have to be assembled.” 

“Such as?’ 

“Bioinformatics and systems-biology 
experts, together with intelligence communi- 
cation specialists, should work out some sort 
of code to transmit messages, to try to parley 
the mitochondria into staying. But maybe 
negotiating a forgiving exit strategy is more 
reasonable. In that case, metabolomics gurus, 
working with supply officers and nutrition- 
ists, had better put their heads together to 
figure out how humans are to survive on 
glycolysis alone. And if bioengineers could 
start designing artificial mitochondria with 
no free will, that might also prove useful” 

“Can we really survive this?’ 

“Take it easy,’ Clara said with a tired smile, 
“we might be OK. As long as the centrioles or 
the Golgi don't start getting any funny ideas.” m 
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